Complex Aggregates over Subsets of Elements Celine Vens < Celine.Vens@irc.vib-ugent.be > Sofie Van Gassen < Sofie.VanGassen@intec.ugent.be > Tom Dhaene < Tom.Dhaene@intec.ugent.be > Yvan Saeys < Yvan.Saeys@irc.vib-ugent.be > September 16, 2014
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Flow Cytometry Measurement of cell properties in a fluidic system Celine Vens et.al. — September 16, 2014 2/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Flow Cytometry Hundreds of thousands of cells are measured for each patient ⇒ Relational data Cell ID Cell measurements Patient ID Class Clinical information Primary table that has a one-to- many relationship with a secondary table Celine Vens et.al. — September 16, 2014 2/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on individual cells • E.g. Does the patient have a cell with a CD3 value larger than x ? Celine Vens et.al. — September 16, 2014 3/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on individual cells • E.g. Does the patient have a cell with a CD3 value larger than x ? • Too specific Celine Vens et.al. — September 16, 2014 3/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on aggregates over cells • E.g. Is the mean CD3 value of all cells from the patient larger than x ? Celine Vens et.al. — September 16, 2014 3/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on aggregates over cells • E.g. Is the mean CD3 value of all cells from the patient larger than x ? • Too general Celine Vens et.al. — September 16, 2014 3/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on complex aggregates of cells • E.g. Is the number of cells with a CD3 value larger than x larger than y ? Celine Vens et.al. — September 16, 2014 3/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on complex aggregates of cells • E.g. Is the number of cells with a CD3 value larger than x larger than y ? • Good compromise, but ... Celine Vens et.al. — September 16, 2014 3/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Problems with traditional complex aggregates 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 Celine Vens et.al. — September 16, 2014 4/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Complex aggregates over subsets Complex aggregate over clusters We present a new type of complex aggregates, where a subset is defined as a cluster of the set. Advantages • Clusters can define more specific subsets in comparison with conditions on individual attribute values • Clusters can be aggregated in more advanced ways, capturing information about the shape of the cluster Celine Vens et.al. — September 16, 2014 5/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Partitional (flat) clustering • Cluster the data with a clustering algorithm of choice • One cluster structure for all patients or • Independent clustering for each patient • Pre-compute aggregate functions on the obtained clusters • Add this new information to the relational database • Possibility to remove the original data, to speed up the relational learning Celine Vens et.al. — September 16, 2014 6/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Online hierarchical clustering • Cluster the data during the learning algorithm • Clustering takes place ” on demand”to refine or generalize a rule • E.g. “Is the mean CD3 value of all cells from the patient larger than x ?” ⇒ • Perform a hierarchical clustering step • Is the mean CD3 value of one of the resulting clusters larger than x ? Celine Vens et.al. — September 16, 2014 7/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Online hierarchical clustering • Many opportunities to split or merge will result in a larger search space • Original data needs to be stored • In this work we focus on the partitional clustering approach Celine Vens et.al. — September 16, 2014 7/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Synthetic dataset 1 Mean accuracy 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 1 Tilde decision tree, 5-fold cross validation Celine Vens et.al. — September 16, 2014 8/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results HIV Vaccine Trials Network • Dataset from the FlowCAP II challenge • 48 patients (27 for training, 21 for testing) • Two samples from each patient, challenged with different antigens • Goal: detect automatically which antigen has been used Celine Vens et.al. — September 16, 2014 9/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results HIV Vaccine Trials Network • Clustering with the FlowMeans-algorithm • Bagging ensemble of 100 Tilde trees Celine Vens et.al. — September 16, 2014 9/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results HIV Vaccine Trials Network • We obtain a test accuracy of 0.95 • In line with other FlowCAP results, but several state-of-the-art algorithms obtain perfect accuracy • Further optimization might improve our results Celine Vens et.al. — September 16, 2014 9/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Conclusion Our contributions are twofold: • We introduced flow cytometry data to the ILP community • We introduced a new kind of complex aggregates: cluster aggregates Celine Vens et.al. — September 16, 2014 10/11
c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Thank You! Any questions? Celine Vens et.al. — September 16, 2014 11/11
Recommend
More recommend