UCL Data-As-A-Science Graduate Bootcamp

11 July 2025

London, Camden | UCL

Data has become the most important language of our era, informing everything from intelligence in automated machines, to predictive analytics in medical diagnostics. The plunging cost and easy accessibility of the raw requirements for such systems – data, software, distributed computing, and sensors – are driving the adoption and growth of data-driven decision-making.

UCL Data-As-A-Science Graduate Bootcamp

As it becomes ever-easier to collect data about individuals a diverse range of professionals, who have never been trained for such requirements, grapple with inadequate analytic and data management skills, as well as the ethical risks arising from personal data possession and opaque algorithmic tools.

However, the fundamental skills required to lead algorithmic decision-making are not universally taught, and are often surrounded with unnecessary complexity.

The key to unlocking data reuse, and new economic and social development opportunities from these data, rely on both data producers, and data users, having technical insight necessary to manage those who work with data, and a conscious and motivated understanding of the new algorithmic tools available to us.

Our solution

Whythawk redeveloped our lengthier Data as a Science programme - stripping out the coding, mathematical and technical aspects - into an intense, focused research-based five-day bootcamp. Data Science for Non-Data Scientists guides learners to confidence in the curation, ethics, analysis and presentation of data. Each day of the five-day course is an individual lesson guided by the following four topics:

  • Ethics: determine the social and behavioural challenges posed by a research question.
  • Curation: establish the research requirements for data collection and management.
  • Analysis: investigate, explore and analyse research data.
  • Presentation: prepare and present the results of analysis to promote a response.

The five days are structured as follows:

  • Lesson 1: Ethical reasoning, data curation and evidence-led decision-making
    • Identify concepts in ethical reasoning which may influence our analysis and results from data.
    • Understand the process of data curation, and the custodial duty of data science.
    • Investigate and review data to learn its metadata, shape and robustness.
    • Identify an appropriate chart and present data to illustrate its core characteristics.
  • Lesson 2: Research and experiments with data, and finding meaning in complexity
    • Recognise the importance and process for applying concepts of privacy and anonymity.
    • Integrate methods for metadata and archival into data management.
    • Investigate data distribution and confidence.
    • Illustrate core analysis with histograms and box plots.
  • Lesson 3: Probability, randomness, and the predictive value of synthetic data
    • Determine the implications in the collection, mining and recombination of open- and digital data.
    • Employ methods for presenting data for synthesis and usage, and employing methods for data maintenance.
    • Assess techniques in randomness and probability to understand distribution and likelihood.
    • Investigate histograms, line charts and scatter plots to illustrate probability.
  • Lesson 4: Pragmatic reasoning and investigating questions without a known causal mechanism
    • Acknowledge the privacy and confidentiality issues in data storage and security of personal data.
    • Recognise responsibilities and mechanisms for securing data-at-rest and data-in-motion.
    • Consider linear and continuous sampling methods to assess normal distributions.
    • Present distributions as normal histograms and continuous curves.
  • Lesson 5: Persuasion and the art of causal, probabilistic thought
    • Integrate the lessons learned in a live simulation to persuade others to action.

Each lesson will guide participants through review of a question requiring a time-constrained response, and with multiple competing ethical, technical and management considerations. Each day will conclude with teams competing to persuade the class of the conclusions they have reached.

Outcomes

A pilot course was run with 30 postgraduate doctoral students at UCL in July as part of their Summer School, and was very well received. Recordings from the live sessions are available in the online textbook. We are currently exploring the methods for offering this as a continuing part of graduate and post-graduate education.

Related projects

La Marine Nationale Française CKAN Upgrade & Deployment
17 October 2025

Marine Nationale has an existing CKAN data management portal deployed on their internally accessible network. This is a secure environment, and upgrades and extensions to the software are performed by Marine Nationale directly.

openLocal Commercial Location Data for England & Wales Research Integration
30 June 2025

openLocal.uk is a quarterly-updated commercial location database, aggregating open data on vacancies, rental valuations, rates & ratepayers, into an integrated time-series database of individual retail, industrial, office and leisure business units.

Hop Sauna - Interledger Foundation Ambassadorship 2025 for Developing a Social Marketplace Web Stack
1 February 2025

Hop Sauna is a core technical stack aimed at developers to support implementing a federated, community-moderated web shop offering custom digital objects.

essential