As it becomes ever-easier to collect data about individuals a diverse range of professionals, who have never been trained for such requirements, grapple with inadequate analytic and data management skills, as well as the ethical risks arising from personal data possession and opaque algorithmic tools.
However, the fundamental skills required to lead algorithmic decision-making are not universally taught, and are often surrounded with unnecessary complexity.
The key to unlocking data reuse, and new economic and social development opportunities from these data, rely on both data producers, and data users, having technical insight necessary to manage those who work with data, and a conscious and motivated understanding of the new algorithmic tools available to us.
Our solution
Whythawk redeveloped our lengthier Data as a Science programme - stripping out the coding, mathematical and technical aspects - into an intense, focused research-based five-day bootcamp. Data Science for Non-Data Scientists guides learners to confidence in the curation, ethics, analysis and presentation of data. Each day of the five-day course is an individual lesson guided by the following four topics:
- Ethics: determine the social and behavioural challenges posed by a research question.
- Curation: establish the research requirements for data collection and management.
- Analysis: investigate, explore and analyse research data.
- Presentation: prepare and present the results of analysis to promote a response.
The five days are structured as follows:
- Lesson 1: Ethical reasoning, data curation and evidence-led decision-making
- Identify concepts in ethical reasoning which may influence our analysis and results from data.
- Understand the process of data curation, and the custodial duty of data science.
- Investigate and review data to learn its metadata, shape and robustness.
- Identify an appropriate chart and present data to illustrate its core characteristics.
- Lesson 2: Research and experiments with data, and finding meaning in complexity
- Recognise the importance and process for applying concepts of privacy and anonymity.
- Integrate methods for metadata and archival into data management.
- Investigate data distribution and confidence.
- Illustrate core analysis with histograms and box plots.
- Lesson 3: Probability, randomness, and the predictive value of synthetic data
- Determine the implications in the collection, mining and recombination of open- and digital data.
- Employ methods for presenting data for synthesis and usage, and employing methods for data maintenance.
- Assess techniques in randomness and probability to understand distribution and likelihood.
- Investigate histograms, line charts and scatter plots to illustrate probability.
- Lesson 4: Pragmatic reasoning and investigating questions without a known causal mechanism
- Acknowledge the privacy and confidentiality issues in data storage and security of personal data.
- Recognise responsibilities and mechanisms for securing data-at-rest and data-in-motion.
- Consider linear and continuous sampling methods to assess normal distributions.
- Present distributions as normal histograms and continuous curves.
- Lesson 5: Persuasion and the art of causal, probabilistic thought
- Integrate the lessons learned in a live simulation to persuade others to action.
Each lesson will guide participants through review of a question requiring a time-constrained response, and with multiple competing ethical, technical and management considerations. Each day will conclude with teams competing to persuade the class of the conclusions they have reached.
Outcomes
A pilot course was run with 30 postgraduate doctoral students at UCL in July as part of their Summer School, and was very well received. Recordings from the live sessions are available in the online textbook. We are currently exploring the methods for offering this as a continuing part of graduate and post-graduate education.