REMBRANDT : Building a robust translational research framework for brain tumor studies RE pository of M olecular BRA in N eoplasia D a T a Himanso Sahni Center for Bioinformatics, NCI SAIC
Challenges Few therapeutic advances in the last 3 decades Histopathological classifications for the heterogeneous group of tumors known as gliomas are broad and do not predict for therapeutic outcome or prognosis Standard therapies generally have minimal effect on long term survival
Rembrandt Knowledgebase Datawarehouse Expression array data SNPArray data Better understanding Better treatments Proteomics data Clinical data Concept Creation Concept Creation
NCI’s GMDI Study Blood Tumor Plasma Tumor DNA RNA Proteins Core Punch
Typical Rembrandt Usage Scenario In brain tissue from patients diagnosed with the glioblastoma multiforme (GBM) subtype of Astrocytoma, which genes in the EGF signaling pathway are over or under expressed in cancerous versus normal tissue? Is there a correlation between the expression and genomic (copy number) data collected from these patients? How did EGFR up-regulation affect survival of patients within this study? Of these groups of samples, which ones were obtained from patients that were males and were diagnosed between the ages of 25 and 40 yrs?
Rembrandt’s Objectives Must support translation research use cases: Build an infrastructure that provides users with the ability to create complex translational queries For Example: Ability to AND /OR a Gene Expression query with a Copy Number query and then further nest this within a Clinical Results Query Ability to further refine the results by applying a criteria to the subset of samples grouped by high order analysis Ability to apply filters to the result set for user friendly analysis.
Rembrandt’s Objectives (cont’d) Allow users to view the results by easily pivoting between the various dimensions: Grouped by Disease Grouped by Patient / Sample Grouped by Genes for Gene Expression or Cytogenic Location for Copy Number View Associated Annotations Time Course View (future)
Gene Expression Search Use cases Search differential gene expression by Gene Name <<Uses>> <<Uses>> Search differential gene <<Extends >> Calculate fold change expression by fold change <<Uses>> <<Extends>> <<Uses>> <<Extends>> Search differential gene expression by chromosomal region Obtain gene information from Search RBT Affy Gene <<Uses>> cytoband location RBT_USER Expression Dataset <<Extends>> <<Uses>> <<Extends>> Obtain cytoband location Search differential gene form gene name expression by Probeset ID <<Uses>> <<Extends>> <<Uses>> Search differential gene expression by GO Terms <<Uses>> <<Uses>> <<Uses>> Search differential gene expression by Pathway name <<Uses>> Get Genes
Rembrandt’s caBIG objectives Aligns with NCI’s caBIG (cancer Biomedical Informatics Grid) principles: Open source Open access Syntactic and Semantic interoperability Federated access Leverage NCICB and caBIG Infrastructure Components caCORE Infrastructure (caBIO, EVS, caDSR) caARRAY gene expression data repositories and analysis tools C3D Clinical Informatics System caBIG Infrastructure being delivered by caBIG workspaces See https://cabig.nci.nih.gov/
Rembrandt Technical Objectives Build a scalable high performance application Tiered Architecture Abstraction / Model View Controller Support Strong Type Checking & Validations “Fast” Queries User Friendly Interface Groundwork for a robust translational research framework
Rembrandt Current Architecture Complex Query Graphical Plots Tabular Reports Builder User Interface text MicroArray text text text Other Clinical SNPArray caBIO Annotations Cache Manager Report Builder Query Builder Middle Tier Extract Transfer Load Processes Run Time Analysis Components Query Processing (Future) Object Relational Mapping caIntegrator
Another Architecture Perspective JSPs Servlets Struts Domain Result Set Query Criteria Look Up Elements (XML/XSLT) Query Processor Cache Manager Result Set Processor (EHCHACHE) Apache’s Object Relational Bridge (OBJ) Rembrandt Study Data Warehouse (Star Schema)
Query & Retrieval Objects : Support Strong Type Checking & Validations Such as Query, View, Criteria, Domain Element objects Abstracts presentation logic from the query helper objects Provides the ability to nest cross domain queries (AND/OR) Is strongly typed Can validate itself
Example: Criteria Objects cd criteria DomainElement de::CytobandDE Criteria Object + CytobandDE(String) + setValue(Object) : void + getValueObject() : String + setValueObject(String) : void Consist of DomainElements +cytoband Criteria RegionCriteria Provide Generic Cross - cytoband: CytobandDE - chromNumber: ChromosomeNumberDE - start: BasePairPositionDE.StartPosition - end: BasePairPositionDE.EndPosition DomainElement - empty: boolean = true Domain Filters de::ChromosomeNumberDE + isValid() : boolean +chromNumber + ChromosomeNumberDE(String) + getCytoband() : CytobandDE Each Criteria can validate + setValue(Object) : void + setCytoband(CytobandDE) : void + getValueObject() : String + getStart() : BasePairPositionDE.StartPosition + setValueObject(String) : void + setStart(BasePairPositionDE.StartPosition) : void + getEnd() : BasePairPositionDE.EndPosition itself + setEnd(BasePairPositionDE.EndPosition) : void + getChromNumber() : ChromosomeNumberDE + setChromNumber(ChromosomeNumberDE) : void For e.g.: RegionCriteria +end +start Consists of inner class inner class de:: de:: ChromosomeNumberDE, BasePairPositionDE:: BasePairPositionDE:: EndPosition StartPosition {leaf} {leaf} CytobandDE, + EndPosition(Integer) + StartPosition(Integer) BasePairPositionDEs for start DomainElement & end positions. de::BasePairPositionDE - positionType: String Is used in both Gene + START_POSITION: String = "StartPosition" + END_POSITION: String = "StartPosition" Expression and Comparative - BasePairPositionDE(String, Integer) + getPositionType() : String + setValue(Object) : void Genomic domain queries + getValueObject() : Integer + setValueObject(Integer) : void
Agnostication can result in Obfuscation… Challenge: Making Rembrandt dB agnostic using a standard Object Relational Mapping (ORM) layer AND still create high performance queries. Currently using Apache’s Object Relational Bridge (OJB) as the ORM layer .( http://db.apache.org/ojb/ ) All ORMs provide great abstraction but may not help produce the most efficient SQL. Custom implementations or extending frameworks can become a maintenance nightmare.
High Performance Query Processing Multi-threaded Query Processing: All queries are constructed and executed in parallel on separate threads from Java server side Dimensional Result Set Processing All result set dimensions are reconstituted in Java server side For example: The entire Chromosome 7 (1 and 15854551 bp) Able to retrieve about 51,000 fact records plus all associated annotations and display results for all 51 samples in 20 sec.
) o e i r e u Q b u S s D I e b r ( M : = g e t u P l t i p l e s ) s ( p , s e i r e u Q e b o r p s e S x e c u t e u e b Q u e r i t i o r u S e t u c e x e ) ( s e i e Q e l e P r o b I u D s S u b Q b u L s y a r A ) S D I e b o r p , e e r r i e s ( p o i b e Q u e r r b i t p m a S e t u c e x e = : e e t e I D C r i ) S R e s u l t l Q e l y r e u q , s D I e n o l C a u P e r y ( a l r , o b e I D s n g e n e g ( e m a N s a l C D I e e e a I D S ) C l s G : = g e t n I ( g s e u l a V D I e n e G t e D r C r i t ) A a = y L i s t : p t s E t i r C D I r e t r o p e R G r t : S e l e c H : a n d l e r e i o a a H t c a F E G : r e l d n H a D : G e n e I C a r i t e r i r s d : Q n o i s e r p x E e n e G g e y d q u e r p n r o c e s i u r e n c o r P y r e u Q : r e l d a y x : G e n e E p H r Q u e r y n l l O a , s D I b o r P l a , j b t C r r ( r e p o t i e r I D C r l l l r u M t e g = : t s i L y a A n e I D s , e v n ) t ) r u n ( e d e g r e l d n a H y r e u Q t e ) H r Q u e r y a = n d l e r : ( R n ( a H t c e l e S ) y r e u q e e t s u l t S e : l = h a n d Multi-threaded Query Processing in Java
Recommend
More recommend