Introduction to Survival Analysis Kan Ren Apex Data and Knowledge Management Lab Shanghai Jiao Tong University Seminar Tutorial at Apex Lab Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 1 / 32
Outline Background 1 Probability Censored Data Challenges Methodology 2 Non-parametric Models Kaplan Meier Estimator Survival Tree Parametric Model Cox Hazard Proportional Model Deep Survival Analysis Evaluation 3 Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 2 / 32
Background Probability Outline Background 1 Probability Censored Data Challenges Methodology 2 Non-parametric Models Kaplan Meier Estimator Survival Tree Parametric Model Cox Hazard Proportional Model Deep Survival Analysis Evaluation 3 Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 3 / 32
Background Probability Probability Probability Density Function (P.D.F.): p t ( t ) = Pr ( T = t ) . (1) Cumulative distribution function (C.D.F.): � t w t ( t ) = Pr ( T < t ) = p t ( v ) dv . (2) 0 Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 4 / 32
Background Censored Data Outline Background 1 Probability Censored Data Challenges Methodology 2 Non-parametric Models Kaplan Meier Estimator Survival Tree Parametric Model Cox Hazard Proportional Model Deep Survival Analysis Evaluation 3 Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 5 / 32
Background Censored Data Censored Data Right Censored Data The event happens after the observation time. E : Event; t obsv : The observe time; { ( x , t obsv , e = True/False) } ; { ( x , T E ) } , T E is the event happening log. Example Patient’s survival time. The true winning price of a bidding auction. The next visit time of the user. Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 6 / 32
Background Challenges Outline Background 1 Probability Censored Data Challenges Methodology 2 Non-parametric Models Kaplan Meier Estimator Survival Tree Parametric Model Cox Hazard Proportional Model Deep Survival Analysis Evaluation 3 Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 7 / 32
Background Challenges Challenges Right Censorship Partially data usage: discard large data for learning. Right Censorship: only know that the event happening time is greater than the observing time window. Evaluation: proper evaluation metric is needed. Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 8 / 32
Background Challenges Modeling Right Censored Data in Display Ads Losing and Winning in 2nd-price Auction Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 9 / 32
Background Challenges Modeling Right Censored Data Right Censored Right Censorship As in 2 nd price auction, if you lose , you only know that the market price is higher than your bidding price, which result in right censorship . Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 10 / 32
Methodology Non-parametric Models Outline Background 1 Probability Censored Data Challenges Methodology 2 Non-parametric Models Kaplan Meier Estimator Survival Tree Parametric Model Cox Hazard Proportional Model Deep Survival Analysis Evaluation 3 Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 11 / 32
Methodology Non-parametric Models Kaplan Meier Estimator Preliminaries S ( t ) = Pr ( t < T E ): Survival rate F ( t ) = 1 − S ( t ): Failing rate. Algorithm The estimator for an individual is given by � � 1 − d i � S ( t ) = , (3) n i i : t i ≤ t where d i is the number of events and n i is the total individuals at risk at time i . Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 12 / 32
Methodology Non-parametric Models Survival Tree with Kaplan Meier Methods Cons of KM Corse grained, the same for all individuals. Statistcal method, cannot apply personalized forecasting. Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 13 / 32
Methodology Non-parametric Models Survival Tree with Kaplan Meier Methods Cons of KM Corse grained, the same for all individuals. Statistcal method, cannot apply personalized forecasting. Question How to apply an appropriate clustering method for one individual? Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 13 / 32
Methodology Non-parametric Models Tree-based Mapping Goal Given the auction feature x , forecast the market price distribution p x ( z ) a . a Yuchen Wang, Kan Ren, Weinan Zhang, Yong Yu. Functional Bid Landscape Forecasting for Display Advertising. ECML-PKDD, 2016. Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 14 / 32
Methodology Non-parametric Models Tree-based Mapping Methodology Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 15 / 32
Methodology Non-parametric Models Node Splitting Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 16 / 32
Methodology Non-parametric Models Node Splitting KLD and Clustering Kullback-Leibler Divergence (KLD) A measure of the difference between two probability distributions P and Q . Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 17 / 32
Methodology Non-parametric Models Node Splitting KLD and Clustering Kullback-Leibler Divergence (KLD) A measure of the difference between two probability distributions P and Q . Node Splitting (one step) Divide all the category (including in this node) values into two sets, maximizing KLD between the resulted two sets. Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 17 / 32
Methodology Non-parametric Models Node Splitting KLD and Clustering Kullback-Leibler Divergence (KLD) A measure of the difference between two probability distributions P and Q . Node Splitting (one step) Divide all the category (including in this node) values into two sets, maximizing KLD between the resulted two sets. Algorithm Using K-Means Clustering according to KLD values. Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 17 / 32
Methodology Non-parametric Models Node Splitting KLD and Clustering Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 18 / 32
Methodology Non-parametric Models Handling Censorship Survival Model For winning auctions: We have the true market price value. For lost auctions: We only know our proposed bid price and know that the true market price is higher than that. Intuition Most related works focus only on the winning auctions without considering the lost auction, which contains the information to infer the true distribution. ( b i , w i , m i ) i =1 , 2 , ··· , M − → ( b j , d j , n j ) j =1 , 2 , ··· , N b j < b j +1 , d j is number of winning auctions by b j − 1, n j is number of lost auctions by b j − 1. So n j − d j � w ( b x ) = 1 − , p ( z ) = w ( z + 1) − w ( z ) . (4) n j b j < b x Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 19 / 32
Methodology Non-parametric Models Survival Model Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 20 / 32
Methodology Parametric Model Outline Background 1 Probability Censored Data Challenges Methodology 2 Non-parametric Models Kaplan Meier Estimator Survival Tree Parametric Model Cox Hazard Proportional Model Deep Survival Analysis Evaluation 3 Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 21 / 32
Methodology Parametric Model Cox Hazard Proportional Model Hazard Rate The rate of the event happening given not happened before. Hazard Function The function λ ( t | x ) to predict the hazard rate w.r.t. the covariate input x . Hazard Proportional Model The hazard function which models with the proportional relationship with the input covariate, where λ ( t | x ) = λ 0 ( t ) exp( h ( x )). Example Linear Cox Hazard Model: h ( x ) = β x . Question: What if h ( x ) is non-linear? Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 22 / 32
Recommend
More recommend