D ata-driven drug discovery for a variety of diseases by machine - PowerPoint PPT Presentation

D ata-driven drug discovery for a variety of diseases by machine learning 機械学習による様々な疾患に対するデータ駆動型の創薬 Yoshihiro Yamanishi 山西芳裕 Kyushu University Medical Ins;tute of Bioregula;on 九州大学生体防御医学研究所

Drug discovery is very difficult 創薬は難しく、時間も費用もかかる • Time consuming: 10-15 years • High cost: about 1 billion $ • High risk: result in failure – Insufficient efficacy Cost – Difficult production – Unexpected toxicity S uccess 2 *http://www.fda.gov, **http://www.phrma.org

Eco-Pharma (Drug reposi;oning) エコファーマ（ドラッグリポジショニング） • Iden;fica;on of new therapeu;c effects (i.e., new applicable diseases) of exis;ng drugs. 既存薬の新しい効能を発見し、別の疾患の薬として開発 • Rich informa;on on exis;ng drugs available (e.g., safety for human, manufacturing process). 豊富な情報（人での安全性や製造法など）がある • Fast development and low risk. 高速かつ低リスクな創薬が可能

The cost can be reduced in terms of ;me, risk, and expenditure. 時間、リスク、費用を大幅に削減できる Traditional EcoPharma Process approach in this study (10 〜 17 years) (3 〜 9 years) 1. Screen compounds ○ ○ 2. Optimize chemical structures ○ - 3. Confirm safety with animals ○ - Skip 4. Confirm efficacy with animals ○ - 5. Confirm safety for human ○ - 6. Confirm efficacy for human ○ ○ 7. Approve ○ ○

Examples 例 • Sildenafil (Viagra) シルデナフィル（バイアグラ） Angina 狭心症 → Erec;le dysfunc;on 男性機能障害 → Pulmonary hypertension 肺高血圧症 • Minoxidil (Riup, Rogaine) ミノキシジル Hypertension 高血圧 → Alopecia ( h air loss) 脱毛症 Previously, it has been dependent on serendipity. これまでは偶然の発見に大きく依存していた

Goal of this study 本研究の目標 • Automa;c predic;on of new drug effects f rom various biomedical big data. ビッグデータから薬物の新規効能を自動的に予測 Object Data Drugs/compounds chemical structures, side effects, clinical reports, drug- induced gene expression profiles, compound-protein interac;ons Proteins/genes amino acid sequences, pathways, func;onal mo;fs, domains, structures, physiological roles, pathological roles Diseases disease-causing genes, disease pathways, environmental factors, biomarkers, gene expression profiles of pa;ents, disease complica;on

AI-based drug discovery AI 創薬 Machine learning methods to predict new associations between drugs and diseases 薬物と疾患の関係を機械学習で予測する known effects Drug 1 Disease 1 Drug 2 Disease 2 Drug 3 Disease 3 Drug 4 f ( x , y ) = w T φ ( x , y ) new effects to be predicted

Molecular understanding n disease-causing genes 病因遺伝子 n disordered pathways 異常パスウェイ of a variety of diseases n environmental factors 環境因子様々な疾患の分子的理解が n abnormal gene expression 発現異常遺伝子進んできた patient normal gene 1 gene 2 gene 3 Biological system

Characteris;c molecular features are o\en shared among different diseases 分子的特徴は疾患間で共通する場合がある disease A disease B common features

A representa;on of the drug mechanism 薬物はタンパク質に相互作用し、疾患に対する効能を発揮する Target proteins (20,000) Diseases (1,500) Drugs (8,000) z1 y1 x1 y2 z2 x2 y3 x3 z3 Known interac;on Unknown interaction (to be predicted in this study)

Proposed method 提案手法 Predic;on of drug-protein-disease network with machine learning 薬物が、どのタンパク質に相互作用し、どの疾患に効くかを予測 Target proteins (20,000) Diseases (1,500) Drugs (8,000) z1 y1 x1 y2 z2 x2 y3 x3 z3 Known interac;on Unknown interaction (to be predicted in this study)

Drug- p rotein interac;on predic;on 薬物・タンパク質相互作用の予測 A pairwise model for any drug-protein pair ( ʹ x , ʹ z ): n z n z n x n x ∑ ∑ ∑ ∑ f ( ʹ x , ʹ z ) = a ij k (( x i , z j ),( ʹ x , ʹ z )) a ij k x ( x i , ʹ x ) k z ( z j , ʹ z ) = Drug Protein i = 1 j = 1 i = 1 j = 1 similarity similarity Step 1: Pairwise learning Feature space Drug space Learning a model Interacting pair Protein space Non-interacting pair (Yamanishi et al, Bioinformatics , 2008; Takarabe et al, Bioinformatics , 2012; Yamanishi et al, Nucleic. Acid Res. , 2014)

Drug- p rotein interac;on predic;on 薬物・タンパク質相互作用の予測 A pairwise model for any drug-protein pair ( ʹ x , ʹ z ): n z n z n x n x ∑ ∑ ∑ ∑ f ( ʹ x , ʹ z ) = a ij k (( x i , z j ),( ʹ x , ʹ z )) a ij k x ( x i , ʹ x ) k z ( z j , ʹ z ) = Drug Protein i = 1 j = 1 i = 1 j = 1 similarity similarity Step 2: Predicting new interactions Feature space New pairs Prediction Interacting pair Non-interacting pair (Yamanishi et al, Bioinformatics , 2008; Takarabe et al, Bioinformatics , 2012; Yamanishi et al, Nucleic. Acid Res. , 2014)

a1 a2 a1 a2 進化的な起源を同じくする遺伝子ホモログ（Homolog）種分岐文字の一致（マッチ）、不一致（ミスマッチ）、 a1 a2 遺伝子重複 a 種分岐の際に同じ遺伝子だったもの生物種の系統関係分子レベル（配列レベル）の情報：16S rRNA 動物菌類植物原生生物オーソログ（Ortholog）通常同じ機能を持つ古細菌種分岐２つのタンパク質または遺伝子の配列を並べて、配列アライメント配列アライメントオーソログとパラログ種２種１ a1’ パラログ（Paralog） a1 a 水平移動によって得られた類似遺伝子ゼノログ（Xenolog）通常異なる機能を持つ遺伝子重複によってできた類似遺伝子真核生物真正細菌実際には類似性を調べるグローバルアライメント挿入、欠失を考慮するアライメントのキーポイントはアライメントの種類アライメントの方法・アルゴリズムアライメントを評価するためのスコアスコアの重要性を評価するための統計的基準配列全体を並べるゲノム解析、ポスト・ゲノム解析とバイオインフォマティクスローカルアライメント局所的によく似た部分を探すマッチ（+, |）ミスマッチギャップ・挿入（-）配列アライメントの種類 2012年度「理論分子生物学」講義予定表配列アライメント、ダイナミックプログラミング法一つの生物種内で類似した機能を持つ遺伝子 1 異なる生物種間で同じ機能を持つ遺伝子２つの遺伝子が進化的に関連があるか？進化的な関連があるかどうかを調べること２つのタンパク質または遺伝子の配列を並べて、配列アライメント（sequence alignment）配列アライメント http://goto.kuicr.kyoto-u.ac.jp/lecture/bioinfo.html ホモロジー検索、FASTA、BLASTアルゴリズム演習分子生物学データベースネットワーク解析遺伝子の機能アノテーション、比較ゲノム解析二次構造予測、膜貫通部位予測、立体構造予測配列モチーフマルチプルアライメント、系統樹解析ホモログ（相同）かどうかを調べること Chemical structure-based approach 化学構造に基づくアプローチ Strategy: Chemically similar drugs are predicted to interact with similar target proteins etc. Protein Drug chemical structure etc. タンパク質薬の化学構造 475,692 KCF-S substructures Drug similarity (Kotera et al, BMC Syst. Biol., 2013) possible chemical substructures Jaccard k x ( x i , x j ) coefficient for i , j = 1,2,..., n x Protein similarity Local sequence alighnment kernel (Saigo et al, Bioinformatics , 2004) k z ( z i , z j ) -------TGKG-------- ! ||| ! for i , j = 1,2,..., n z -------AGKG-------- !

Gene expression-based approach 遺伝子発現に基づくアプローチ Strategy: Transcrip;onally similar drugs are predicted to interact with similar target proteins Protein Drug-induced gene expression etc. タンパク質薬物応答遺伝子発現 gene expression Each drug is represented by a gene expression profile profile in which each element is the ra;o of drug treatment against control based on LINCS query (public database) d rug Drug similarity: correlation x = ( x 1 , x 2 ,  , x 22276 ) T cell line coefficient

Performance evalua;on on several benchmark datasets of different chemical diversi;es 化学構造の多様性を考慮して性能評価 6769 interac;ons involving 1874 drugs and 436 proteins （ KEGG, DrugBank, Matador ） G ene expression - based approach does not depend on chemical structures 遺伝子発現による予測は化学構造に依存しない ◯：フェノタイプ ◯： G ene expression △： Chemical structure ＋： Gene expression & Chemical structure Low threshold: only structurally diverse drugs High threshold: many structurally similar drugs

D ata-driven drug discovery for a variety of diseases by machine - PowerPoint PPT Presentation

D ata-driven drug discovery for a variety of diseases by machine learning Yoshihiro Yamanishi Kyushu University Medical Ins;tute of Bioregula;on

Engineering November 2, 2009 Innovative Solutions Through Test and Analysis-Driven Design ATA

Rheumatological Rheumatological Diseases Diseases Rheumatological Rheumatological Diseases

Network-Driven Drug Discovery: An Application of In-Memory Distributed Processing Jonny Wray,

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Massively Multitask Networks for Drug Discovery Ramsundar et al. (2015) What is Drug Discovery?

CD3 Centre for Drug Design and Discovery The investment fund for innovative small molecule

Why EVs are key to your biz strategy now Beln Gallego ATA Insights belen.gallego@ata.email

D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. W ATERLOO U. T ORONTO UBC

Prescription Drug Abuse Is Drug Abuse About Rx Drug Abuse What is prescription (Rx) drug

Drug education in schools ALCOHOL AND DRUG FOUNDATION 28/11/2017 Drug education in schools

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

Drug Discovery Process Drug Discovery Toolbox Insights on the Origins of Biological Activities

Drug Discovery using Grid Technologies Yuichiro Inagaki Biotechnology division Fuji Research

University of Pittsburgh Drug Discovery Institute The Role of Systems Biology in Drug Discovery

Discovery of Drug Sensitizing Genotypes in Discovery of Drug Sensitizing Genotypes in Cancer Cells

Bridging The Valley Of Death In Academic Drug Discovery Dennis Liotta, Ph.D. Dennis Liotta,

Second-Quarter 2018 Earnings July 26, 2018 Forward Looking Statements This presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

Implementing Closed Captioning Implementing Closed Captioning for DTV for DTV Graham Jones

Scaling up Hybrid Probabilistic Inference with Logical and Arithmetic Constraints via Message

Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across

Source Modeling, Numerical Simulations, and Data Analysis Joan Centrella Laboratory for High

Persistent Occiput Posterior (OP): Is Manual Rotation the Answer? Brian L. Shaffer, MD Maternal

Early Detection Research Network (EDRN) A National Infrastructure for Biomarker Development

Sambuz

Useful Links

Newsletter

Mail Us