Machine Learning for Pairwise Data Adriana Bˆ ırlut ¸iu
A few words about myself ◮ 2012 − Scientific programmer S.A.I.A.&OncoPredict: working on applications of machine learning and computational intelligence in medical oncology. ◮ 2006 − 2011 Research assistant - Ph.D. student, Institute for Computing and Information Sciences, Radboud University Nijmegen, Netherlands. Thesis: Machine learning for pairwise data. ◮ 2000 − 2005 B.Sc. and M.Sc. from Faculty of Mathematics and Computer Science, Babe¸ s-Bolyai University Cluj-Napoca.
Machine learning ◮ Branch of AI focused on the design and development of methods that allow machines to learn based on observations. ◮ Spam filtering, speech and hand-write recognition, medical diagnosis, detecting credit card fraud, stock market analysis. ◮ Availability of empirical data and computational power.
Supervised learning Learning a latent function from observations ◮ Data D = { ( x i , y i ) , i = 1 , . . . , n } ◮ Input space X ⊆ R d , output space Y ⊆ R ◮ Goal: predict functional relation f : X → Y
Machine learning In many cases obtaining labeled data to train the algorithms is expensive!
Machine learning ↔ human learning Characteristics of learning: ◮ based on prior experience → multi-task/transfer learning ◮ selects the most useful information → active learning
Machine learning ↔ human learning Characteristics of learning: ◮ based on prior experience → multi-task/transfer learning ◮ selects the most useful information → active learning Applied to: preference learning and supervised network inference.
Machine learning ↔ human learning Characteristics of learning: ◮ based on prior experience → multi-task/transfer learning ◮ selects the most useful information → active learning Applied to: preference learning and supervised network inference. Connection between the two: pairwise data.
Preference learning ◮ Learning from observations that reveal information about the preferences of an individual or a class of individuals. ◮ Used in decision support systems, recommender systems. ◮ Application areas: E-commerce, marketing, health care, computer games.
Personalization of hearing-aids Goal: tune the parameters so as to maximize the user satisfaction Problems: ◮ Large dimensionality of the parameter space ◮ Determinants of hearing-impaired user satisfaction are unknown ◮ Listening tests are costly and unreliable = > Personalized fitting based on a probabilistic framework
Personalization and decision making for hearing-aids Finding the hearing-aid parameters that are optimal for a patient
Bayesian updating ◮ Suppose θ is unknown ◮ Start from a prior distribution P ( θ ) ◮ Update this prior based on observations D using Bayes rule. P ( θ | D ) = P ( D | θ ) P ( θ ) P ( D )
Bayes rule ◮ P ( rain ) = 20% ◮ P ( umbrella | rain ) = 70% and P ( umbrella | no rain ) = 10% Does not need to sum up to 100%, contrary to P ( umbrella | rain ) + P ( no umbrella | rain ) ◮ Bayes rule P ( rain ) × P ( umbrella | rain ) P ( rain | umbrella ) = P ( rain ) × P ( umbrella | rain )+ P ( no rain ) × P ( umbrella | no rain ) 0 . 2 × 0 . 7 = 0 . 2 × 0 . 7+0 . 8 × 0 . 1 = 64%
Multi-task learning Learning multiple functions → multi-task/transfer learning
Supervised inference of biological networks ◮ Infer missing edges in a graph ( dotted edges ) where a few edges are already known ( solid edges ). ◮ Use attributes available about individual vertices, such as vectors of expression levels across different experiments if vertices are genes.
Supervised edge inference or link prediction ◮ o , o ′ : two proteins ◮ x ( o ) and x ( o ′ ): input feature vectors encoding some properties of o and o ′ Learn a function f : ( x ( o ) , x ( o ′ )) → { 0 , 1 } from training data D = { x ( o i ) , i = 1 , . . . , p ; A ij , i , j = 1 , . . . , p } .
Network topology ◮ Scale-free architecture ◮ Clustering coefficient, network diameter, average shortest path ◮ Network motifs: small subgraphs which appear in the network significantly more frequently than in a randomized network How can this information be used?
Personalized cancer medicine ◮ microArray data ◮ Diagnostic, predicting recurrence, predicting progression ◮ Problem: large number of features − > large dimensionality ◮ Feature selection: maximum relevance minimum redundance ◮ Misclassification costs ◮ Cancer pathways ◮ Personalize cancer treatment
Decision tree classifier i-Biomarker represented by a decision tree. The samples are classified in the terminal nodes of the tree: cancer (red rectangles) or normal (blue rectangles). For a new sample, we observe the values of the three genes and compare them with the threshold values identified at each node.
Ensemble methods Ensemble method classification flow chart. The approach is to compose an ensemble with n i-Biomarkers (decision trees), each i-Biomarker trained on a data set derived from the original data set. Each i-Biomarker is used for making predictions on the samples from the test data set. The votes of individual i-Biomarkers are integrated in a final decision (diagnosis).
Recommend
More recommend