RDA FairTracks schema interoperability

30 November 2024

Oslo | RDA-EOSC, in collaboration with the FAIRification of Genomic Annotations Working Group

Omnipy and whyqd (/wɪkɪd/) are independently-developed Python libraries offering general functionality for auditable and executable metadata mappings. In this project, we will integrate Omnipy and Whyqd to develop executable mappings that transform existing metadata from biodiversity projects, such as ERGA, to conform to the FGA-WG metadata model, kickstarting the process of FAIRifying genome annotation GFF3 files.

RDA FairTracks schema interoperability

The FAIRification of Genomic Annotations Working Group (WG) will focus on the challenges of harmonising metadata and software solutions to improve the discovery and reuse of publicly available “genomic annotation” data. Gavin Chait of Whythawk and Sveinung Gundersen of the WG wanted to begin a process of integrating their two data wrangling software projects, whyqd and Omnipy respectively.

We developed a research proposal for submission to the annual BioHackathon Europe and were selected as participants for the November 2024 event.

Our solution

BioHackathon Europe is an annual event that brings together life scientists from around the world. It is organised by ELIXIR Europe, and offers an intense week of hacking, with over 160 participants working on diverse and exciting projects. The goal is to create code that addresses challenges in bioinformatics research.

Our team consisted of genomics researchers from around the world, including teams onsite in Barcelona and remotely in the UK and Australia. During the week-long event, we worked collaboratively to develop methods that would form part of a tutorial, and additional policy and technical outputs for the WG.

These included:

  • Assess research workflows and systems to decide on appropriate strategies for mapping from complex source data to a defined hierarchical destination schema,
  • Develop techniques for defining minimal metadata to support genome annotations as FAIR objects,
  • Derive a convenience schema from the hierarchical FAIRtracks model and use this as a model for adapting to other formally-defined schemas,
  • Develop interoperable executable mappings from a bioinformatics case-study to the convenience schema.

Outcomes

We succesfully developed methods for creating convenience schemas, including creating recommendations for refinements to the FAIRtracks model. A tutorial and general guidelines were delivered and are now formally part of whyqd’s documentation, and the WG’s deliverables.

Related projects

RDA MOMSI multi-omics metadata standards dashboard
31 January 2025

Multi-Omics Metadata Standards Integration (MOMSI) Research Data Alliance Working Group wanted to build a machine-actionable, query-based, interactive dashboard. The dashboard will render information from their existing Landscape Review, currently contained in a Google Sheet format.

Assessment of impact of proposed UK business rates
24 January 2025

In its Autumn Budget, the UK Government made a commitment to transform the business rates over the parliament into a fairer system that supports investment and is fit for the 21st century. Businesses have raised concerns that the business rates system disincentivises investment and is slow to respond to changing economic conditions. They have called for response.

Evaluating the digital readiness of secondary schools in The Gambia
27 June 2024

The Millennium Challenge Corporation developed a compact to support The Gambia's education development with a focus on ensuring the digital readiness of secondary schools. Ensuring appropriate support requires knowledge of what already exists, and what the limits are to digital readiness.

essential