Computational Genomics Francisco García García BIER fgardos@gmail.com Máster en Biotecnología Biomédica. UPV
Why are we interested in Computational Genomics? The overall goal : Apply computational methods to biomedical and biotechnological problems Research interests : The development and application of novel bioinformatics methods aimed at discovering new drugs Identifjcation of genes or proteins may be considered therapeutic targets Personalized medicine : tools for discovering and diagnostic Introduction Why Computational Genomics?
Computational Genomics Genomics Transcriptomics Metabolomics Lipidomics Proteomics Epigenomics Introduction Omics sciences
Computational Genomics How do these technologies work ? Introduction High throughput technologies: microarrays
Computational Genomics How do these technologies work ? Reference genome Introduction High throughput technologies: Next Generation Sequencing
Computational Genomics Regulatory KEGG Gene elements Biological Ontology pathways MiRNA, CisRed knowledge InterPro Transcription Factor Biocarta Binding Sites Motifs Gene pathways Expression Bioentities from in tissues literature Clinical ClinVar HUMSAVAR knowledge HGMD COSMIC Introduction Clinical and biological databases
Computational Genomics Introduction Personalized Medicine
Computational Genomics + Introduction Personalized Medicine
Descripción de las sesiones 3 sesiones (7 horas) sobre el uso de herramientas web para el análisis e interpretación de datos de secuenciación . T oda la documentación (presentaciones + ejercicios) que necesitaremos durante estos días, estarán disponibles en este enlace http://bioinfo.cipf.es/mbb/. T ambién en Poliformat. Docentes : Marta Hidalgo y Paco García. El enfoque de las sesiones será práctico y sólo introduciremos aquellos conceptos que precisemos para los ejercicios. Introduction Máster en Biotecnología Biomédica. UPV.
Programa Sesión 1 • Introducción a las tecnologías NGS. • Estudios de detección de variación genómica. Pipeline de análisis de datos genómicos. • ¿Cómo detectar mutaciones de interés en estudios de exomas completos? Ejercicios con la herramienta web BiERapp. Sesión 2 Estudios de variación genómica: secuenciación genómica dirigida. ¿Cómo diseñar un panel de genes? ¿Cómo analizar e interpretar datos de paneles de genes?. Ejercicios con TEAM. Variabilidad genética española. Base de datos CSVS. Estudios transcriptómicos con datos de NGS. Pipeline de análisis de datos de expresión. ¿Cómo analizar datos de RNA-Seq desde la suite Babelomics? Sesión 3 Análisis de datos transcriptómicos en el contexto de las rutas de señalización. Ejercicios con las herramientas web hipathia y PathAct. Introduction Máster en Biotecnología Biomédica. UPV.
Web tools to analyze omic data BIER fgardos@gmail.com Máster en Biotecnología Biomédica. UPV
NGS Data Analysis Pipeline Fastq Sequence preprocessing Fastq Alignment BAM Visualization Resequencing RNA-Seq BAM Data Analysis Data Analysis RNA-Seq processing Variant calling Count matrix VCF RNA-Seq data analysis Variant annotation Prioritization Functional analysis Introduction NGS data analysis: pipelines
Fastq format We could say “it is a fasta with qualities ”: 1. Header (like the fasta but starting with “@”) 2. Sequence (string of nt) 3. “+” and sequence ID (optional) 4. Encoded quality of the sequence @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 Introduction NGS data analysis: fjles format
BAM/SAM format @PG ID:HPG-Aligner VN:1.0 @SQ SN:20 LN:63025520 HWI-ST700660_138:2:2105:7292:79900#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA GIJGJLGGFLILGGIEIFEKEDELIGLJIHJFIKKFELFIKLFFGLGHKKGJLFIIGKFFEFFEFGKCKFHHCCCF AS:i:254 NH:i:1 NM:i:0 HWI-ST700660_138:2:2208:6911:12246#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA HHJFHLGFFLILEGIKIEEMGEDLIGLHIHJFIKKFELFIKLEFGKGHEKHJLFHIGKFFDFFEFGKDKFHHCCCF AS:i:254 NH:i:1 NM:i:0 HWI-ST700660_138:2:1201:2973:62218#2@0/1 0 20 76655 254 76M * 0 0 AACCCCAAAAATGTTGGAAGAATAATGTAGGACATTGCAGAAGACGATGTTTAGATACTGAAAGGGACATACTTCT FEFFGHHHGGHFKCCJKFHIGIFFIFLDEJKGJGGFKIHLFIJGIEGFLDEDFLFGEIIMHHIKL$BBGFFJIEHE AS:i:254 NH:i:1 NM:i:1 HWI-ST700660_138:2:1203:21395:164917#2@0/1 256 20 68253 254 4M1D72M * 0 0 NCACCCATGATAGACCAGTAAAGGTGACCACTTAAATTCCTTGCTGTGCAGTGTTCTGTATTCCTCAGGACACAGA #4@ADEHFJFFEJDHJGKEFIHGHBGFHHFIICEIIFFKKIFHEGJEHHGLELEGKJMFGGGLEIKHLFGKIKHDG AS:i:254 NH:i:3 NM:i:1 HWI-ST700660_138:2:1105:16101:50526#6@0/1 16 20 126103 246 53M4D23M * 0 0 AAGAAGTGCAAACCTGAAGAGATGCATGTAAAGAATGGTTGGGCAATGTGCGGCAAAGGGACTGCTGTGTTCCAGC FEHIGGHIGIGJI6FCFHJIFFLJJCJGJHGFKKKKGIJKHFFKIFFFKHFLKHGKJLJGKILLEFFLIHJIEIIB AS:i:368 NH:i:1 NM:i:4 SAM Specifjcation: http://samtools.sourceforge.net/SAM1.pdf Introduction NGS data analysis: fjles format
VCF format http://www.1000genomes.org/ Introduction NGS data analysis: fjles format
Counts Sample Gene Introduction NGS data analysis: fjles format
Transcriptomic Studies BIER fgardos@gmail.com Máster en Biotecnología Biomédica. UPV
RNA-Seq Data Analysis Pipeline 1. Sequence preprocessing 1. Sequence preprocessing Primary 2. Mapping 3. Quantifjcation 4. Normalization Secondary 5. Difgerential expression 6. Functional Profjling Babelomics 5 RNA-Seq Data Analysis
Babelomics 5 http://babelomics.bioinfo.cipf.es/ Babelomics 5 Analyzing omics data + functional profjling
Differential Expression NORMALIZATION + FUNCTIONAL UPLOAD EDIT DIFFERENTIAL PROFILING DATA DATA EXPRESSION Babelomics 5 Analyzing omics data + functional profjling
Supervised and Unsupervised Classification RPKM TMM CLUSTERING UPLOAD NORMALIZE EDIT DATA DATA DATA PREDICTORS Babelomics 5 Analyzing omics data + functional profjling
Signaling Pathways Analysis http://hipathia.babelomics.org/ hiPhatia Signaling Pathways Analysis
Genomic Variation Studies BIER fgardos@gmail.com Máster en Biotecnología Biomédica. UPV
Genomics Data Analysis Pipeline 1. Sequence preprocessing 1. Sequence preprocessing Primary Analysis 2. Mapping 3. Variant calling Secondary 4. Variant prioritization Pipeline Resequencing Data Analysis
How do we prioritize variants in whole exome studies? http://courses.babelomics.org/bierapp/ BIER BiERapp Discovering variants
Introduction Whole-exome sequencing has become a fundamental tool for the discovery of disease-related genes of familial diseases but there are diffjculties to fjnd the causal mutation among the enormous background There are difgerent scenarios, so we need difgerent and immediate strategies of prioritization Vast amount of biological knowledge available in many databases We need a tool to integrate this information and fjlter immediately to select candidate variants related to the disease BiERapp Discovering variants
How does BiERapp work? Filterings VCF fjle multisample BiERapp VARIANT CellBase BiERapp Discovering variants
Input: VCF fjle 1. Sequence preprocessing 1. Sequence preprocessing Primary Analysis 2. Mapping VCF fjles 3. Variant calling Secondary BiERapp 4. Variant prioritization BiERapp Discovering variants
Can I interpret sequencing data for diagnostic? http://courses.babelomics.org/team/ BIER TEAM Targeted Enrichment Analysis and Management
Gene panel Sequencing Biological data knowledge ClinVar TEAM HGMD HUMSAVAR COSMIC Diagnostic TEAM Targeted Enrichment Analysis and Management
Gene panel 1. VCF fjles TEAM 2. Gene panel ClinVar HGMD HUMSAVAR COSMIC TEAM Targeted Enrichment Analysis and Management
CSVS: CIBERER Spanish Variant Server Repositorio de frecuencias de variantes en la población española http://csvs.babelomics.org/ CSVS CIBERER Spanish Variant Server
CIBERER Spanish Variant Server CSVS Local genetic variability
Tool interface http://csvs.babelomics.org/ CSVS CIBERER Spanish Variant Server
Genome Maps Visualizador genómico que interactúa con bases de datos funcionales http://genomemaps.org/ Genome Maps A next-generation web-based genome browser
Tool interface Genome Maps A next-generation web-based genome browser
Cell Maps Herramienta de modelización y visualización de redes biológicas http://cellmaps.babelomics.org/ Cell Maps Visualizing and integrating biological networks
Recommend
More recommend