Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Contributions Goal: build useful BNP models for specific data exploration tasks. Atom-dependent DP mixture model Poisson factor analysis (PFA) models • estimates density in stratified data → flexible feature models for count data • suitable for fairness requirements • link to mixture of experts 1 Hierarchical PFA: • deals with stratified data Case-control IBP feature model 2 Three-parameter Restricted PFA: • infers latent features in • imposes structured sparsity in heterogeneous structured data latent space • suitable to separate global and 3 Dynamic PFA: group-specific effects • allows for time-varying • combined with statistical testing activation of latent factors Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 6/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Contributions Goal: build useful BNP models for specific data exploration tasks. Atom-dependent DP mixture model Poisson factor analysis (PFA) models • estimates density in stratified data → flexible feature models for count data • suitable for fairness requirements • link to mixture of experts 1 Hierarchical PFA: • Application: marathon • deals with stratified data 2 Three-parameter Restricted PFA: Case-control IBP feature model • imposes structured sparsity in • infers latent features in latent space heterogeneous structured data 3 Dynamic PFA: • suitable to separate global and • allows for time-varying group-specific effects activation of latent factors • combined with statistical testing • Application: international trade • Application: clinical trial Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 6/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 7/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Bayesian nonparametrics (BNPs) • Bayesian framework for model selection • Nonparametric: number of parameters grows with the amount of data: • Prior over infinite-dimensional parameter space • Only a finite subset of parameters is used for any finite dataset Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 8/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Bayesian nonparametrics (BNPs) • Bayesian framework for model selection • Nonparametric: number of parameters grows with the amount of data: • Prior over infinite-dimensional parameter space • Only a finite subset of parameters is used for any finite dataset • Rely on stochastic processes: • Dirichlet process • Beta process • Gaussian process • · · · Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 8/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dirichlet process (DP) G ∼ DP( α, H ) ∞ � G = π k δ φ k k =1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 9/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dirichlet process (DP) G ∼ DP( α, H ) ∞ � G = π k δ φ k k =1 • central block for infinite mixture models Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 9/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dirichlet process (DP) G ∼ DP( α, H ) Stick-breaking representation (Ishwaran et.al, 2001) For k = 1 , · · · , ∞ k − 1 � v k ∼ Beta( α, 1) , π k = v k (1 − v ℓ ) ℓ =1 1 π 1 k = 1 π 2 k = 2 ∞ π 3 � G = π k δ φ k k = 3 . . . k =1 π ∼ GEM( α ) • central block for infinite mixture models Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 9/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dirichlet process (DP) Stick-breaking representation G ∼ DP( α, H ) (Ishwaran et.al, 2001) For k = 1 , · · · , ∞ k − 1 � v k ∼ Beta( α, 1) , π k = v k (1 − v ℓ ) ℓ =1 1 π 1 k = 1 π 2 k = 2 π 3 ∞ k = 3 � G = π k δ φ k . . . π ∼ GEM( α ) k =1 For k = 1 , · · · , ∞ • central block for infinite mixture models φ k ∼ H Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 9/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Indian buffet process (IBP) • central block for infinite latent feature models Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 10/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Indian buffet process (IBP) • central block for infinite latent feature models • hierarchy of a Beta process (BP) with multiple Bernoulli processes (BeP) ∞ � G = π k δ φ k ∼ BP( c, α, H ) k =1 1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 10/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Indian buffet process (IBP) • central block for infinite latent feature models • hierarchy of a Beta process (BP) with multiple Bernoulli processes (BeP) ∞ � G = π k δ φ k ∼ BP( c, α, H ) k =1 1 For n = 1 , · · · , ∞ ∞ � ζ n = z nk δ φ k ∼ BeP( G ) k =1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 10/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Indian buffet process (IBP) • central block for infinite latent feature models • hierarchy of a Beta process (BP) with multiple Bernoulli processes (BeP) ∞ � G = π k δ φ k ∼ BP( c, α, H ) k =1 1 For n = 1 , · · · , ∞ ∞ � ζ n = z nk δ φ k ∼ BeP( G ) k =1 Z ∼ IBP( α ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 10/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 11/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation 1 What is the impact of age and gender on runners performance? 2 Can we compare different runners in a fair manner? • entry requirements • rewards Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 12/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation 1 What is the impact of age and gender on runners performance? 2 Can we compare different runners in a fair manner? • entry requirements • rewards Our Approach • dependent density estimation model • delivers scientific knowledge in sport sciences • constitutes a fair age-gender grading system • relies on dependent Dirichlet process Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 12/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dependent Dirichlet process (DDP) (MacEachern,2000) J : number of groups ∞ � G j = π jk δ φ jk k =1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 13/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dependent Dirichlet process (DDP) (MacEachern,2000) J : number of groups ∞ � G j = π jk δ φ jk k =1 • hierarchical DP (Teh et.al, 2005) ∞ � G j = π jk δ φ k k =1 hierarchical DP single-p DDP • single-p DDP (MacEachern, 2000) G 0 ∼ DP ( α, H) G 0 ∼ DP ( α, H) ∞ � G j = π k δ φ jk G j ∼ DP ( γ, G 0 ) G j = T j [ G 0 ] k =1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 13/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Atom-dependent DP mixture model Generative model x i ≡ marathon finishing time for runner i π | α ∼ GEM( α ) c i | π ∼ Cat( π ) µ 0 , σ 2 � � µ k ∼ N 0 σ 2 x ∼ IG ( a, b ) x i | µ c i , σ 2 � � x i | other vars ∼ N x Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 14/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Atom-dependent DP mixture model Generative model x ji ≡ marathon finishing time for runner i in age group j π | α ∼ GEM( α ) c ji | π ∼ Cat( π ) µ 0 , σ 2 � � µ k ∼ N 0 σ 2 x ∼ IG ( a, b ) θ ∼ N ( 0 , Σ θ ) x ji | µ c ji + θ j , σ 2 x ji | other vars ∼ N � � x � � − ( ℓ − q ) 2 ( Σ θ ) ℓq = σ 2 θ exp + κδ ( ℓ − q ) 2 ν 2 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 14/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Atom-dependent DP mixture model Generative model x ji ≡ marathon finishing time for runner i in age group j g ji ≡ gender π | α ∼ GEM( α ) c ji | π ∼ Cat( π ) µ 0 , σ 2 µ k ∼ N � � 0 σ 2 x ∼ IG ( a, b ) θ ∼ N ( 0 , Σ θ ) 0 , σ 2 � � δ ∼N ω ω ∼N ( 0 , Σ ω ) x ji | µ c ji + θ j + ✶ [ g ji = 1]( δ + ω j ) , σ 2 � � x ji | other vars ∼ N x � − ( ℓ − q ) 2 � ( Σ θ ) ℓq = σ 2 θ exp + κδ ( ℓ − q ) 2 ν 2 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 15/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions • MCMC approach Results • conditional conjugacy Impact of age • block Gibbs sampler • 1 / 4 M runners 0 . 8 Histogram pdf by ADDP 0 . 6 Indiv. clusters 0 . 4 0 . 2 0 2 4 6 8 Finishing time (hours) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 16/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions • MCMC approach Results • conditional conjugacy Impact of age • block Gibbs sampler 5 • 1 / 4 M runners New York City Boston 0 . 8 London Finishing time (hours) Histogram WMA 4 pdf by ADDP 0 . 6 Indiv. clusters 0 . 4 3 0 . 2 0 2 4 6 8 2 20 30 40 50 60 70 Finishing time (hours) Age Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 16/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions • MCMC approach Results • conditional conjugacy Impact of age • block Gibbs sampler 5 • 1 / 4 M runners µ 1 + θ j µ 2 + θ j 0 . 8 New York City Finishing time (hours) Histogram Boston 4 pdf by ADDP London 0 . 6 Indiv. clusters WMA 0 . 4 3 0 . 2 0 2 4 6 8 2 20 30 40 50 60 70 Finishing time (hours) Age Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 16/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results Impact of gender 34 δ + ω j (mins) 32 30 28 26 10 20 30 40 50 60 70 age (years) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 17/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results Impact of gender 34 δ + ω j (mins) 32 30 28 26 10 20 30 40 50 60 70 age (years) 12 Other Results 11 Cluster 0 (7.2%, T=3.80h) • Speed-dependent cluster means Cluster 1 (24.4%, T=3.93h) 10 Cluster 1 − (14.9%, T=4.03h) Speed (km/h) Elevation (m) Cluster 1 − − (3.6%, T=4.16h) • Link to mixture of experts 9 100 Cluster 2A (13.4%, T=4.17h) Cluster 2A − (11.3%, T=4.27h) 80 Cluster 2A − − (3.2%, T=4.43h) • Analysis of running patterns 8 60 Cluster 2B (1.1%, T=4.32h) Cluster 2B − (1.6%, T=4.47h) 40 • Prediction of finishing time 7 Cluster 3 (3.4%, T=4.56h) Cluster 3 − (4.4%, T=4.59h) 20 Cluster 3 −− (1.4%, T=4.88h) 6 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 42.2 40 42.2 km Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 17/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 18/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. We want to discover: Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. We want to discover: 1 Indicators of disease progression: prognostic biomarkers Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. We want to discover: 1 Indicators of disease progression: prognostic biomarkers 2 Indicators of (positive) drug response: predictive biomarkers Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. 0 1 We want to discover: 1 Indicators of disease progression: prognostic biomarkers 2 Indicators of (positive) drug response: predictive biomarkers Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions General latent feature model (GLFM) (Valera et.al, 2017) Latent feature model for heterogeneous datasets σ 2 B d = 1 . . . D B • d φ d α Z Y • d X Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 20/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions General latent feature model (GLFM) (Valera et.al, 2017) • Link functions T d depend on type of Latent feature model for data for each dimension d heterogeneous datasets x nd = T d ( y nd ; φ d ) σ 2 B N ( Z n • B • d , σ 2 y nd | Z , B ∼ y ) N (0 , σ 2 d = 1 . . . D B kd ∼ B ) B • d ∼ Z IBP( α ) φ d α Z Y • d X Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 20/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions General latent feature model (GLFM) (Valera et.al, 2017) • Link functions T d depend on type of Latent feature model for data for each dimension d heterogeneous datasets x nd = T d ( y nd ; φ d ) σ 2 B N ( Z n • B • d , σ 2 y nd | Z , B ∼ y ) N (0 , σ 2 d = 1 . . . D B kd ∼ B ) B • d ∼ Z IBP( α ) φ d Our contribution to GLFM project α Z Y • d X • Open-source python code • Simulations for data exploration https://github.com/ivaleraM/GLFM Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 20/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Our contribution: Case-control IBP (C-IBP) R n : drug indicator por patient n σ 2 B x nd = T d ( y nd ; φ d ) d = 1 . . . D y nd | Z , W , B , C , R ∼ N ( Z n • B • d + ✶ [ R n = 1] W n • C • d , σ 2 B • d y ) R B kd ∼ N (0 , σ 2 B ) φ d Z ∼ IBP ( α ) C kd ∼ N (0 , σ 2 C ) α Y • d Z X W ∼ IBP ( α ) • Inference : MCMC approach with α W C • d accelerated Gibbs sampling • Biomarker discovery : statistical multiple hypothesis testing σ 2 C Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 21/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: subpopulations GPC3 Antibody Treatment against Liver Cancer (J. Hepatology. 2016 Apr, Abou-Alfa et.al.) • 180 patients: 60 took a placebo, 120 took the drug • PFS: Progression Free Survival Sub-population Drug Identifier Size (number Median PFS of patients) Mean PFS (months) (months) F1 F2 F3 1. 0 0 0 0 33.37 3.06 1.65 2. 0 0 1 0 4.07 2.29 2.24 3. 0 1 0 0 17.84 2.72 1.81 4. 0 1 1 0 4.72 7.05 7.18 5. 1 0 0 0 51.52 3.22 2.55 6. 1 0 0 1 16.77 4.17 3.65 7. 1 0 1 0 8.38 1.74 1.33 8. 1 0 1 1 2.07 2.69 2.65 9. 1 1 0 0 29.88 3.36 2.03 10. 1 1 0 1 4.90 4.44 4.34 11. 1 1 1 0 4.53 6.31 5.31 12. 1 1 1 1 1.94 10.04 10.01 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 22/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: subpopulations GPC3 Antibody Treatment against Liver Cancer (J. Hepatology. 2016 Apr, Abou-Alfa et.al.) • 180 patients: 60 took a placebo, 120 took the drug • PFS: Progression Free Survival Sub-population Drug Identifier Size (number Median PFS of patients) Mean PFS (months) (months) F1 F2 F3 1. 0 0 0 0 33.37 3.06 1.65 2. 0 0 1 0 4.07 2.29 2.24 3. 0 1 0 0 17.84 2.72 1.81 4. 0 1 1 0 4.72 7.05 7.18 5. 1 0 0 0 51.52 3.22 2.55 6. 1 0 0 1 16.77 4.17 3.65 7. 1 0 1 0 8.38 1.74 1.33 8. 1 0 1 1 2.07 2.69 2.65 9. 1 1 0 0 29.88 3.36 2.03 10. 1 1 0 1 4.90 4.44 4.34 11. 1 1 1 0 4.53 6.31 5.31 12. 1 1 1 1 1.94 10.04 10.01 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 22/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: subpopulations GPC3 Antibody Treatment against Liver Cancer (J. Hepatology. 2016 Apr, Abou-Alfa et.al.) • 180 patients: 60 took a placebo, 120 took the drug • PFS: Progression Free Survival Sub-population Drug Identifier Size (number Median PFS of patients) Mean PFS (months) (months) F1 F2 F3 1. 0 0 0 0 33.37 3.06 1.65 2. 0 0 1 0 4.07 2.29 2.24 3. 0 1 0 0 17.84 2.72 1.81 4. 0 1 1 0 4.72 7.05 7.18 5. 1 0 0 0 51.52 3.22 2.55 6. 1 0 0 1 16.77 4.17 3.65 7. 1 0 1 0 8.38 1.74 1.33 8. 1 0 1 1 2.07 2.69 2.65 9. 1 1 0 0 29.88 3.36 2.03 10. 1 1 0 1 4.90 4.44 4.34 11. 1 1 1 0 4.53 6.31 5.31 12. 1 1 1 1 1.94 10.04 10.01 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 22/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: subpopulations GPC3 Antibody Treatment against Liver Cancer (J. Hepatology. 2016 Apr, Abou-Alfa et.al.) • 180 patients: 60 took a placebo, 120 took the drug • PFS: Progression Free Survival Sub-population Drug Identifier Size (number Median PFS of patients) Mean PFS (months) (months) 15 F1 F2 F3 1. 0 0 0 0 33.37 3.06 1.65 PFS (months) 2. 0 0 1 0 4.07 2.29 2.24 10 3. 0 1 0 0 17.84 2.72 1.81 4. 0 1 1 0 4.72 7.05 7.18 5. 1 0 0 0 51.52 3.22 2.55 6. 1 0 0 1 16.77 4.17 3.65 5 7. 1 0 1 0 8.38 1.74 1.33 8. 1 0 1 1 2.07 2.69 2.65 9. 1 1 0 0 29.88 3.36 2.03 10. 1 1 0 1 4.90 4.44 4.34 0 11. 1 1 1 0 4.53 6.31 5.31 12. 1 1 1 1 1.94 10.04 10.01 1 2 3 4 5 6 7 8 9 1 0 2 1 1 1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 22/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: biomarker discovery Treatment-specific feature F3 2 0 ∆ d − 2 − 4 e t t I D h P T P e 6 3 6 3 a c r e c r e t m 7 6 5 B 3 4 8 8 K 6 + t - t K 6 6 N P F F 5 7 7 6 L h h M o o y h 6 h g g r 1 D 1 D m i l i l 0 1 4 D D D D 1 5 4 6 5 0 9 T g g F A R o t m b t b C N 6 g g N D D S S A u D D o m e 1 D D D 1 D P 1 / 6 / B A C C a o a C C C C 1 E E D i i o A C c o r M i D i / 0 / 1 e e u i r u i D r r K s C C P r c v c e C C / 8 C D b b C M M W H r t t T V C 4 3 0 1 S t e e r C 4 D N P P S o e C 6 6 1 3 C n 6 6 N P P C D m 6 6 3 3 c r C 5 1 1 C P 1 1 o C - 1 4 s C 6 D D C 3 C 6 D P C i D P D c C d 3 P C P 1 H D 5 C C C s 6 C K C D C D D G P G A 5 m / H N P G / A C s s C D 3 3 i G s / d D D C 6 s 3 C 5 D C D C C Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 23/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 24/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Classical view • Division of labor (A. Smith, 1776; Ricardo, 1817) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Classical view • Division of labor (A. Smith, 1776; Ricardo, 1817) • Specialization leads to economic efficiency Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Classical view • Division of labor (A. Smith, 1776; Ricardo, 1817) • Specialization leads to economic efficiency • Export portfolios Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Classical view • Division of labor (A. Smith, 1776; Ricardo, 1817) • Specialization leads to economic efficiency • Export portfolios → block-structure Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: 0 Countries 40 80 120 0 200 400 600 Products E nd / � p E nd RCA nd = � n E nd / � n,d E nd � if RCA nd ≥ 1 1 , x nd = 0 , otherwise Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: Properties: 0 Countries 40 80 120 0 200 400 600 Products E nd / � p E nd RCA nd = � n E nd / � n,d E nd � if RCA nd ≥ 1 1 , x nd = 0 , otherwise Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: Properties: 0 Countries 1 Triangularity 40 80 120 0 200 400 600 Products E nd / � p E nd RCA nd = � n E nd / � n,d E nd � if RCA nd ≥ 1 1 , x nd = 0 , otherwise Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: Properties: 0 Countries 1 Triangularity 40 2 D ≫ N 80 120 0 200 400 600 Products E nd / � p E nd RCA nd = � n E nd / � n,d E nd � if RCA nd ≥ 1 1 , x nd = 0 , otherwise Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: Properties: 0 Countries 1 Triangularity 40 2 D ≫ N 80 120 Our Approach 0 200 400 600 Products 1 Develop an infinite Poisson factor analysis model . . . E nd / � p E nd RCA nd = • flexible prior � n E nd / � n,d E nd • feature sparsity � if RCA nd ≥ 1 1 , x nd = 2 Design a time-varying 0 , otherwise extension Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Bernoulli process Poisson factor analysis (BeP-PFA) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 27/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Bernoulli process Poisson factor analysis (BeP-PFA) Generative Model � � x nd ∼ Poisson Z n • B • d α B , µ B � � B kd ∼ Gamma α B Z ∼ IBP( α ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 27/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Limitation of the IBP • Number of ones per row J n ∝ Poisson( α ) • Number of non-empty features K ∝ Poisson( α � N 1 j ) j =1 • Mass parameter α couples both J n and K α = 1 α = 10 0 0 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 0 5 0 10 20 30 40 nz = 97 nz = 339 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 28/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Beyond the standard IBP Three-parameter IBP (Teh et.al, 2007) • More flexible distribution for feature weights Z n • ∼ BeP( µ ) (5.1) µ ∼ SBP(1 , α, H, c, σ ) (5.2) � α Γ(1 + c )Γ( n + c + σ − 1) � ∼ p ( J new ) Poisson Γ( n + c )Γ( c + σ ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 29/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Beyond the standard IBP Three-parameter IBP (Teh et.al, 2007) • More flexible distribution for feature weights Z n • ∼ BeP( µ ) (5.1) µ ∼ SBP(1 , α, H, c, σ ) (5.2) � α Γ(1 + c )Γ( n + c + σ − 1) � ∼ p ( J new ) Poisson Γ( n + c )Γ( c + σ ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 29/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Beyond the standard IBP Restricted IBP Three-parameter IBP (Doshi-Velez et.al, 2015) (Teh et.al, 2007) • Arbitrary prior f over J n • More flexible distribution for feature weights Z n • ∼ R - BeP( µ, f ) (5.3) µ ∼ BP(1 , α, H ) (5.4) Z n • ∼ BeP( µ ) (5.1) µ ∼ SBP(1 , α, H, c, σ ) (5.2) � α Γ(1 + c )Γ( n + c + σ − 1) � ∼ p ( J new ) Poisson Γ( n + c )Γ( c + σ ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 29/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Beyond the standard IBP Restricted IBP Three-parameter IBP (Doshi-Velez et.al, 2015) (Teh et.al, 2007) • Arbitrary prior f over J n • More flexible distribution for feature weights Z n • ∼ R - BeP( µ, f ) (5.3) µ ∼ BP(1 , α, H ) (5.4) Z n • ∼ BeP( µ ) (5.1) µ ∼ SBP(1 , α, H, c, σ ) (5.2) � α Γ(1 + c )Γ( n + c + σ − 1) � • Combination of both ∼ p ( J new ) Poisson Γ( n + c )Γ( c + σ ) • Flexible prior Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 29/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Our contributions α = 1 α = 10 0 0 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 0 5 0 10 20 30 40 nz = 97 nz = 339 3RBeP-PFA for static scenario � � x nd ∼ Poisson Z n • B • d α B , µ B � � B kd ∼ Gamma α B ∼ 3R - IBP( α, c, σ, f ) Z • Inference : aux. vars + dynamic programming (Doshi-Velez et.al, 2015) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 30/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Our contributions α = 1 α = 10 0 0 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 0 5 0 10 20 30 40 nz = 97 nz = 339 dBeP-PFA for dynamic scenario 3RBeP-PFA for static scenario x ( t ) Z ( t ) � � ∼ Poisson n • B • d � � x nd ∼ Poisson Z n • B • d nd α B , µ B � � α B , µ B � � B kd ∼ Gamma B kd ∼ Gamma α B α B Z ( • ) ∼ 3R - IBP( α, c, σ, f ) ∼ mIBP( α, γ, δ ) Z n • • Inference : forward-filtering • Inference : aux. vars + dynamic backward-sampling (Gael et.al, 2009) programming (Doshi-Velez et.al, 2015) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 30/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results in static scenario Quantitative analysis: accuracy Vs interpretability Metric PMF NNMF BeP-PFA sBeP-PFA 3RBeP-PFA Log Perplexity 1 . 68 ± 0 . 01 1 . 61 ± 0 . 01 1 . 59 ± 0 . 04 3 . 26 ± 0 . 17 1 . 62 ± 0 . 01 Coherence − 264 . 60 ± 4 . 74 − 263 . 27 ± 7 . 45 − 149 . 36 ± 7 . 56 − 178 . 44 ± 4 . 50 − 140 . 51 ± 2 . 73 (a) 2010 SITC database ( N = 126 , D = 744 ) Metric PMF NNMF BeP-PFA sBeP-PFA 3RBeP-PFA Log Perplexity 1 . 48 ± 0 . 01 1 . 47 ± 0 . 01 1 . 58 ± 0 . 01 2 . 56 ± 0 . 12 1 . 57 ± 0 . 02 Coherence − 264 . 73 ± 3 . 11 − 264 . 67 ± 6 . 22 − 148 . 91 ± 10 . 57 − 168 . 39 ± 13 . 16 − 134 . 51 ± 4 . 43 (b) 2010 HS database ( N = 123 , D = 4890 ) • PMF: Probabilistic matrix factorization (Mnih et.al, 2008) • NNMF: Non-negative matrix factorization (Schmidt et.al, 2009) • BeP-PFA: Bernoulli process Poisson factor analysis • sBeP-PFA: sparse Bernoulli process Poisson factor analysis • 3RBeP-PFA: Three-parameter Restricted Bernoulli process Poisson factor analysis Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 31/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results in static scenario Capturing input sparsity structure 400 400 Inferred Inferred 0 400 0 400 Empirical Empirical (a) Baseline (b) BeP-PFA 400 400 Inferred Inferred 0 400 0 400 Empirical Empirical (c) sBeP-PFA (d) 3RBeP-PFA Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 32/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results in static scenario Interpretability Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 33/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Temporal Dynamics Indonesia 1 F4 Capabilities F5 F0 Bias F9 F1 Agriculture 0 . 5 F2 F2 Clothing I F15 F3 Farming F4 Clothing II 0 F5 Electronics I 1965 1975 1985 1995 2005 F6 Processed Materials F7 Electronics II Egypt F8 Materials I F9 Machinery I 1 F10 Materials II F4 F11 Automobile F11 F12 Chemicals I 0 . 5 F1 F13 Chemicals II F2 F14 Machinery II F15 Miscellaneous 0 1965 1975 1985 1995 2005 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 34/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Temporal Dynamics Indonesia 1 F4 Capabilities F5 F0 Bias F9 F1 Agriculture 0 . 5 F2 F2 Clothing I F15 F3 Farming F4 Clothing II 0 F5 Electronics I 1965 1975 1985 1995 2005 F6 Processed Materials F7 Electronics II Egypt F8 Materials I F9 Machinery I 1 F10 Materials II F4 F11 Automobile F11 F12 Chemicals I 0 . 5 F1 F13 Chemicals II F2 F14 Machinery II F15 Miscellaneous 0 1965 1975 1985 1995 2005 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 34/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Temporal Dynamics Indonesia 1 F4 Capabilities F5 F0 Bias F9 F1 Agriculture 0 . 5 F2 F2 Clothing I F15 F3 Farming F4 Clothing II 0 F5 Electronics I 1965 1975 1985 1995 2005 F6 Processed Materials F7 Electronics II Egypt F8 Materials I F9 Machinery I 1 F10 Materials II F4 F11 Automobile F11 F12 Chemicals I 0 . 5 F1 F13 Chemicals II F2 F14 Machinery II F15 Miscellaneous 0 1965 1975 1985 1995 2005 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 34/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Model extension: Dynamic PFA Indonesia 1 F1 F4 F7 Model extension 0 . 5 F10 F13 x ( t ) Z ( t ) � � ∼ Poisson n • B • d nd 0 α B , µ B � � 1965 1975 1985 1995 2005 B kd ∼ Gamma α B Egypt Z ( • ) ∼ mIBP( α, γ, δ ) n • 1 F3 mIBP: markov Indian buffet process F5 0 . 5 (Gael et.al, 2009) F7 F10 0 1965 1975 1985 1995 2005 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 35/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 36/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Conclusions BNPs • useful BNP models for specific data exploration tasks • Fair density estimation model • Structured general latent feature model (global and group-specific factors) • Flexible Poisson factor analysis models in static/dynamic scenarios Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 37/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Conclusions BNPs • useful BNP models for specific data exploration tasks • Fair density estimation model • Structured general latent feature model (global and group-specific factors) • Flexible Poisson factor analysis models in static/dynamic scenarios Sports science • age-gender curves • fair grading system • running patterns over time Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 37/50
Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Conclusions BNPs • useful BNP models for specific data exploration tasks • Fair density estimation model • Structured general latent feature model (global and group-specific factors) • Flexible Poisson factor analysis models in static/dynamic scenarios Cancer research Sports science • subpopulation • age-gender curves learning • fair grading system • biomarker discovery • running patterns • clinico-genetic over time associations Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 37/50
Recommend
More recommend