Pharma goes FAIR Herman van Vlijmen Janssen Pharmaceu9ca Beerse, Belgium
What is FAIR? hIps://www.dtls.nl/fair-data/fair-data/ 23/11/2017 DTL mee9ng Utrecht 2
Sci Data. 2016, 3:160018 23/11/2017 DTL mee9ng Utrecht 3
Why is FAIR important? • Many data sets and databases are s9ll siWng in silos with – Poor accessibility and/or findability of data – Absent or incomplete use of nomenclature standards • The amount and diversity of scien9fic data is growing fast • Most valuable analysis involves data from different domains/technologies • Machine learning and data mining require unambiguous computer-readable data 23/11/2017 DTL mee9ng Utrecht 4
From molecule to medicine Basic Research/Discovery Development Clinical Trials - Select Disease - Drug metabolism and - Safety (Phase I) - Select Drug Target PharmacokineCcs - Efficacy & Dose (Phase II) - IdenCfy Bioassay - Safety evaluaCon - Efficacy (Phase III) - Find compound hits - Chemical producCon - PostmarkeCng (Phase IV) - Find lead compound(s) - PharmaceuCcal formulaCon - Select clinical candidate BASIC RESEARCH DEVELOPMENT CLINICAL TRIALS REGISTRATION COMMERCIALIZATION 0 YEAR 4 6 12,5y 20y FROM DISCOVERY OF A DRUG TO SUBMISSION OF REGISTRATION FILE FROM REGISTRATION TO LAUNCH FROM LAUNCH TO LOSS OF PATENT 5 è è 23/11/2017 DTL mee9ng Utrecht 5 5
Why is FAIR important to Janssen? • If the data is there but nobody (re)uses it… • Scien9sts at Janssen rarely use all data they have access to – Difficult to access mul9ple databases – Lack of awareness of databases – LiIle experience with defini9on of cross-domain analysis • Data from mul9ple domains and sources (private, public, commercial) is needed for best possible analysis – Target iden9fica9on and valida9on – Hit finding, H2L, Lead Op9miza9on – Phenotypic screens, Omics experiments – Mechanis9c analysis tox, side effects, drug repurposing – Transla9onal analysis (cell phenotype <-> animal -> human) – Clinical and Real World Evidence analysis 23/11/2017 DTL mee9ng Utrecht 6
Answering more complex ques9ons 23/11/2017 DTL mee9ng Utrecht 7
23/11/2017 DTL mee9ng Utrecht 8
Open PHACTS data sources 23/11/2017 DTL mee9ng Utrecht 9
Phenotypic Drug Discovery Workflows “Knowing the knowns” Digles et al, MedChemComm, 7: 1237 (2016) 23/11/2017 DTL mee9ng Utrecht 10
Open PHACTS developments: Patent Info • Huge amount of knowledge in patent corpus, most of which will never be published elsewhere, but poten9ally great value to drug discovery • SureChEMBL system (EBI) already automa9cally extracts compounds from these documents • Open PHACTS consor9um funded project to also extract gene/disease informa9on (EMBL-EBI and SciBite) • ~4 million patents in total, 260 million annota9ons (patent-compound, patent-gene or patent-disease associa9ons) • Example use cases: – For a given target or disease, give me all the compounds that are linked to this through patents • Important to find new extrac9on tools to con9nue this annota9on and make available at EBI 23/11/2017 DTL mee9ng Utrecht 11
A broad set of use cases can be addressed using a linked data system Some examples: Target idenCficaCon and validaCon • Give me all direct and indirect suppor9ng evidence linking a gene and disease • Are there examples of compounds targe9ng any member of this target family? • What are the relevant indirect links between a gene and a phenotypic assay? Lead idenCficaCon and opCmizaCon • What compounds bind to this target or related targets (family, 3D similarity)? • What bioac9vi9es and pathways are associated to a compound? • Show the ac9vity of these compounds on all kinases involved in this pathway • What are poten9al side-effects of hiWng similar binding sites to our target? • What side effects have similar compounds ? Biomarker discovery • What secreted proteins in a par9cular 9ssue are associated with this cellular pathway and might be biomarkers? • New biomarkers: for which indirect biomarker-disease links there is no direct reported associa9on, and which ones have the strongest level of data support? 23/11/2017 DTL mee9ng Utrecht 12
Querying higher level research ques9ons A comparison of the queries that are done today versus what will be possible FUTURE TODAY What are the Janssen compounds ac9ve in this Janssen Give me all internal/commercial/public data on assay? compounds that are ac9ve on my target and other closely related targets. What is the difference in gene expression profile Given the differences in expression profiles between between tumor and normal 9ssue? these 9ssues, give me the compounds with biochemical acCvity profiles that resemble the difference profile most Search PubMed for poten9al target-disease associa9on: Show me all possible direct and indirect links between “bcl2 schizophrenia” bcl2 and schizophrenia, ranked by level of scienCfic data support I have a CDK7 lead compound. Is there anything known Given my CDK7 lead compound, what are the most likely in PubMed on toxicity of CDK7 inhibitors? mechanisms by which this compound class could cause toxicity 23/11/2017 DTL mee9ng Utrecht 13
Internal efforts in Discovery: Chem 3 • SemanCc graph database of internal and external data linked to chemistry – Compound Ac9vity: • ABCD, Athena, PIRlab, CAPE (all internal Janssen) • ChEMBL, PubChem, GOSTAR, Clarivate • Pending: – SureChEMBL – Etc. based on user needs – Plasorm: Virtuoso – Fast chemical cartridge (internal) • Interface in 3DX, Pipeline Pilot, R 23/11/2017 DTL mee9ng Utrecht 14
Examples of Linked Data challenges in Pharma Data types and units for Tautomerism Stereochemistry pharmacological ac9vity in ChEMBL Lee and Gobbi. J. Chem. Inf. Model. 2012, 52, 285−292 23/11/2017 DTL mee9ng Utrecht 15
Internal efforts in Discovery: Exploring use of Euretos Knowledge Plasorm for TI/TV Step 1 Step 2 www.euretos.com 23/11/2017 DTL mee9ng Utrecht 16
Internal efforts in Discovery: Exploring use of Euretos Knowledge Plasorm for TI/TV Step 3 Step 4 www.euretos.com 23/11/2017 DTL mee9ng Utrecht 17
- Call launched in July 2017 - Candidate consor9a are currently being evaluated - Likely start project Q3 2018, dura9on 3 years - Budget: The financial contribu9on from IMI2 is a maximum of EUR 4M - Pharma partners: Janssen, AZ, Bayer, Boehringer Ingelheim, Eli Lilly, GSK, Novar9s 23/11/2017 DTL mee9ng Utrecht 18
Summary of FAIRifica9on Proposal • Select data sets and databases from finished and ongoing IMI projects, based on: – Scien9fic value of making this data accessible and interoperable – Complexity of making the data available • Select databases at individual EFPIA companies – Selec9on based on value for companies – Consolida9on to limited set of data domains • FAIRify these data sets to enable the sustainable use of the data in answering research ques9ons – Work sessions with data owners and FAIRifica9on experts, including data domain experts (vocabularies, ontologies, use cases) and IT experts (conversion of data, database implementa9on) – Implementa9on of sustainable solu9on for storage and maintenance of FAIRified IMI databases – Iden9fica9on of sustainable solu9on for storage and maintenance of FAIRified EFPIA databases 23/11/2017 DTL mee9ng Utrecht 19
Recommend
More recommend