In the case of the outbreak of an emerging pathogen, finding therapeutics for the treatment and for the containment of the disease in a timely manner is of utter importance. The pandemic of SARS-CoV-2 has shed light on translational bottlenecks that the scientific community will need to address to prevent a possible future pandemic or to more effectively mitigate its potential consequences.
While we gained invaluable insight into the genetic make-up of the virus in matter of days, bottlenecks were uncovered when it came to generating hypothesis for a potential therapeutic intervention with the help of predictive computational methods. One of the bottlenecks is the challenge of integrating various biologically relevant databases due to their heterogenous data structures and the lack of workflows to automate the process. Although a custom solution can be always implemented to orchestrate a data integration process, it requires significant amount of time and effort that will need to be taken away from critical research.
The acute need for COVID-19 therapeutics has inspired the ASPIRE Team at
the National Center for Advancing Translational Sciences and our collaborators to conceptualize a data integration workflow
that can significantly speed up a similar critical drug discovery process in the future.
We implemented this workflow to compile a COVID-19-focused multimodal network in a state-of-the-art graph database, Neo4j.
Our hope is that the
Neo4COVID19 graph database will catalyze network-driven pharmacological research aimed at the discovery of a COVID-19 therapeutics.
Moreover, the workflow developed in the framework of this exercise
can serve as an example in other translational research settings to reduce the time to develop a novel data integration process.
Access to Neo4COVID-19 is provided by various means in order to make it available to as broad a research community as possible.
The Neo4j Data Browser provides a basic interface amenable to a WYSIWYG point-and-click or programmatic access. The latter employs CYPHER language.
When prompted for login credentials, select
No authenticationin the
Authentication typedrop-down list, then select
Connect. The high-level view of the database contents is availabe by clicking on the database icon in the upper-left corner of the interface.
Go to Neo4j Data Browser
A complete Neo4COVID-19 data archive can be easily imported into Cytoscape. Detailed instructions are provided in our manuscript.
With the help of the Neo4j BOLT interface users can access the data via various APIs. Information on how to access the database via the API is detailed in our manuscript.
The data integration workflow used to compile the Neo4COVID-19 database is available as the source code repository.
The Neo4COVID-19 database is distributed under the Attribution 4.0 International (CC BY 4.0) license respecting the original licenses of input sources that might fall under a different license (see our GitHub repository).
Please take a look our paper describing the Neo4COVID-19 database and the data integration workflow.DOI (Preprint): 10.1101/2020.11.04.369041