Our solution
We had several different challenges. The most important was choosing a good case study. We had two options:
- Pandemic response in East Africa where a range of emerging and established diseases were being monitored,
- Extremely multi-drug resistant tuberculosis pandemic response in Eastern Europe.
We chose the TB project in Eastern Europe as being more straightforward. One disease vector, and one process. Even so we then ran into the additional challenge of many of the countries participating being either threatened, or invaded by Russia. However, conflict was also a contributor to the pandemic, as well as to the low-trust environment for data sharing.
We arranged a series of workshops with public health data managers and directors, including in Spain and Turkey, and compiled a list of the challenges they experienced. Much of what we discovered related to data interoperability, as well as limited trust and significant skills shortages for using data gathered during the course of public health interventions to generate research insight.
Whythawk itself could play no roll in the legislative or political challenges, but we were invited by the Gates Foundation to develop a graduate program in Data Science for public health that could be delivered as a one-year taught Masters program.
Outcomes
Whythawk developed the Data as a Science program, including developing four of the 20 taught modules.
The course is based on the Sloyd model of technical training. Each lesson is discrete, building on the previous lesson, and provides a functional and holistic understanding of the scientific method as it applies to data. It is not about learning an algorithm and applying it to abstract, arbitrary data. The course has the objective of training complete data scientists, you will learn how research works and apply tools to a specific case-study.
Each lesson starts with a research question, and progresses by teaching a complete, and practical, set of skills allowing students to learn at their own pace and in an order which suites their current understanding. Case-studies and tutorials are drawn from public health, economics and social issues, and the course is accessible to anyone with an interest in data. Course materials, case studies and guided tutorials are presented in Jupyter Notebooks permitting learners to test running code and gain hands-on understanding of the techniques discussed.
Each lesson is guided by the following four topics:
- Ethics: determine the social and behavioural challenges posed by a research question,
- Curation: establish the research requirements for data collection and management,
- Analysis: investigate, explore and analyse research data,
- Presentation: prepare and present the results of analysis to promote a response.
Unfortunately, following the US defunding of the WHO, priorities had to rapidly shift, and this course development remains incomplete.