machine vision and computation to
play

machine vision and computation to describe genome function at the - PowerPoint PPT Presentation

How do genomes function? Using machine vision and computation to describe genome function at the organismal level. Tessa Durham Brooks, Ph.D. Doane College Department of Biology Anticipation at the dawn of the Genomic Era Within the next


  1. How do genomes function? Using machine vision and computation to describe genome function at the organismal level. Tessa Durham Brooks, Ph.D. Doane College Department of Biology

  2. Anticipation at the dawn of the Genomic Era “Within the next few years, technologies developed for the Human Genome Project and similar sequencing efforts will revolutionize medicine, agriculture, crimefighting, and other fields .” – Gwynne and Page, Science, 2000

  3. The genomic powerhouses For reference: Humans have about 25,000 genes, 3.2 bil base pairs. ~25,000 genes 100 mil base pairs ~20,000 genes Sequenced finished 2000 97 mil base pairs Sequence finished 1998 ~22,000 genes 137 mil base pairs Sequence finished 2000

  4. The problem: at best functional roles have been assigned for 15% of predicted genes of a genome For reference: Humans have about 25,000 genes, 3.2 bil base pairs. ~25,000 genes 100 mil base pairs ~20,000 genes Sequenced finished 2000 97 mil base pairs Sequence finished 1998 ~22,000 genes 137 mil base pairs Sequence finished 2000

  5. Why has determining gene function in multicellular organisms been difficult?

  6. Why has determining gene function in multicellular organisms been difficult?

  7. At a conference this month …, biologists tried to explore how the study of genomes might develop over the next 20 years and what tools might be needed. Central to their vision of the future is a thorough computerization of biology, made necessary by the vast computing power of the genome itself. - NYT 2001 The task seems likely to change the nature of biological research, requiring teams of engineers, mathematicians, nanotechnologists and computer programmers, and farms of computers if not a national computer grid . -NYT 2001 - Collins, 2001

  8. Our goal: Describe how genomes function at the organismal scale. Requirements: • Observations should be made at sufficiently high spatial and temporal resolution • Methods should be relatively high- throughput to allow genomic survey • Observations should be able to be made over time and in many ~25,000 genes environmental contexts

  9. Root gravitropism: a model for image analysis approaches in functional genomics

  10. Root gravitropism: a model for image analysis approaches in functional genomics

  11. Root gravitropism: a model for image analysis approaches in functional genomics Second Order (acceleration) glr3.3-1 vs wt Scale 100 glr3.3-2 vs wt 80 Tip Angle (deg) 60 Time (h) 40 20 First Order (swing rate) glr3.3-1 glr3.3-2 0 SalkCol 0 2 4 6 8 10 Time (h) Scale Miller, Durham Brooks, and Spalding 2010, Genetics

  12. Developing tools to detect genome function at the organismal scale. Requirements:  Observations should be made at sufficiently high spatial and temporal resolution • Methods should be relatively high- throughput to allow genomic survey • Observations should be able to be made over time and in many ~25,000 genes environmental contexts

  13. Automation and High Throughput ver. 1

  14. Doane Phytomorph Automation and High Throughput ver. 2

  15. High throughput genetic stocks • Recombinant inbred lines (RILs) S T S T QTL Analysis • Ecotypes (e.g. 1001 genomes project) Matthieu Reymond Max-Planck Institute

  16. Developing tools to detect genome function at the organismal scale. Requirements:  Observations should be made at sufficiently high spatial and temporal resolution  Method should be relatively high- throughput to allow genomic survey • Observations should be able to be made over time and in many ~25,000 genes environmental contexts

  17. Phenotypes are plastic One ecotype Durham Brooks, Miller, and Spalding 2010, Plant Physiology

  18. Genomics analysis in a multi- dimensional condition space Seed Size LOD Position (cM) Small Large 2d 164 lines X 15 indiv. Seedling Age 3d Time (minutes) Moore, et al., unpublished result 4d

  19. Developing tools to detect genome function at the organismal scale. Requirements:  Observations should be made at sufficiently high resolution  Method should be relatively high- throughput to allow genomic survey  Observations should be able to be made over time and in many environmental contexts ~25,000 genes • Cyberinfrastructure must facilitate the above

  20. Workflow Data Compression Data Data 0.5 TB/day Capture Capture Feature Data Storage Data Storage Extraction (X TB) (30 TB) Data Compression Schorr Center and OSG QTL analysis and Analysis

  21. Data Compression • Time Series of 220 Scan Root Auto Crop & uncompressed TIF’s Response Time Stamp (OSG) • ~225 MB each • Equalize dimensions • Insert the time stamp into the least significant bits of the first 14 pixels of each image Video Image • Time series of lossless -compressed Compression Compression PNG’s (Local Grid) (OSG) • ~195 MB each • Compress to video using FFV1 codec Ship out to UW • Uses a lossless intraframe codec • ~160 MB/frame

  22. Feature Extraction • Decompress Data Compressed • Currently using Machine Learning Data FFV1 codec • Isolate plant’s Arial and ground tissue from image background • Currently Bayesian and SVM methods work well • Isolate and track root tip Tip angle Root tip • Currently image Measurement Identification curvature features are used • Linear regression on the root’s Ship out to Doane meristematic tissue

  23. QTL Analysis - Association of a phenotypic value (e.g. root tip angle) with a genetic element Determine • Find mean tip angle Compile Tip Significance Threshold at each time point Angle Data for each genetic line (OSG) • Permutation testing • Launch analysis • Haley Knott or Multiple Ship out to Doane locally imputation (0.24 vs 63 CPU years • Bleed out to OSG per time point) • Determine max likelihood of randomized data • Choose detection method Additional QTL Detection • Optimize maximum likelihoods of trait data analysis (Local Grid) • Determine significance (LOD) • Interactions, additive effects Ship out to Doane • Plasticity QTL • Conditional QTL (GxE effects)

  24. Progress and Future Directions • One small college has collected over 14,500 individual root gravitropic responses in six conditions (32 TB) in RIL population in 6 mo. • We will finish collection from NILs (near-isogenic lines) - an additional 8,700 individuals, 19 TB in 3 mo. • Begin image analysis and QTL analysis – dataset opens new doors in visualizing genomes • Expand participation to additional institutions (huge potential in scaling of data collection)

  25. Acknowledgments Doane College Computing Mike Carpenter (CIO), David Andersen, Dan Resources Schneider Dr. Chris Wentworth (Physics) Students: Amy Craig and Brad Higgins (Physics), Autumn Longo and Grant Dewey (Biochemistry), Tracy Guy, Miles Mayer, Halie Smith, Anthony Bieck, Sarah Merithew, Devon Niewohner, Muijj Ghani, Julie Wurdeman (Biology) University of Wisconsin Dr. Edgar Spalding’s Laboratory: Dr. Nathan Miller, Candace Moore, Logan Johnson Funding Dr. Miron Livny (CHTC) UNL – Schorr Center and HCC Brian Bockelman and Dr. David Swanson PGRP - 1031416 NE-INBRE EPSCoR - URE University of Florida Dr. Mark Settles

Recommend


More recommend