The FAIRification of Genomic Annotations Working Group (WG) will focus on the challenges of harmonising metadata and software solutions to improve the discovery and reuse of publicly available “genomic annotation” data. Gavin Chait of Whythawk and Sveinung Gundersen of the WG wanted to begin a process of integrating their two data wrangling software projects, whyqd and Omnipy respectively.
We developed a research proposal for submission to the annual BioHackathon Europe and were selected as participants for the November 2024 event.
Our solution
BioHackathon Europe is an annual event that brings together life scientists from around the world. It is organised by ELIXIR Europe, and offers an intense week of hacking, with over 160 participants working on diverse and exciting projects. The goal is to create code that addresses challenges in bioinformatics research.
Our team consisted of genomics researchers from around the world, including teams onsite in Barcelona and remotely in the UK and Australia. During the week-long event, we worked collaboratively to develop methods that would form part of a tutorial, and additional policy and technical outputs for the WG.
These included:
- Assess research workflows and systems to decide on appropriate strategies for mapping from complex source data to a defined hierarchical destination schema,
- Develop techniques for defining minimal metadata to support genome annotations as FAIR objects,
- Derive a convenience schema from the hierarchical FAIRtracks model and use this as a model for adapting to other formally-defined schemas,
- Develop interoperable executable mappings from a bioinformatics case-study to the convenience schema.
Outcomes
We succesfully developed methods for creating convenience schemas, including creating recommendations for refinements to the FAIRtracks model. A tutorial and general guidelines were delivered and are now formally part of whyqd’s documentation, and the WG’s deliverables.