semi supervised inference general theory and estimation
play

Semi-Supervised Inference: General Theory and Estimation of Means - PowerPoint PPT Presentation

Semi-Supervised Inference: General Theory and Estimation of Means Anru Zhang Department of Statistics University of Wisconsin-Madison Workshop in Honor of Larry Brown Joint work with Larry Brown and Tony Cai Nov 30, 2018 In Memory of Larry


  1. Semi-Supervised Inference: General Theory and Estimation of Means Anru Zhang Department of Statistics University of Wisconsin-Madison Workshop in Honor of Larry Brown Joint work with Larry Brown and Tony Cai Nov 30, 2018

  2. In Memory of Larry Figure: Anru’s PhD Thesis Defense, April, 2015 Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 2

  3. My Recent Research • Tensor Data Analysis • Singular Subspace Analysis, PCA 10- 100 trillian • Human Microbiome Studies microbial cells 37 trillion human cells >10,000 microbial species 23 ,000 human genes 3.3 million microbial genes 99.9% of human DNA 80-90% of the is the same gut microbiome are different Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 3

  4. Introduction Semi-supervised Inference • Semi-supervised settings often appear in machine learning and statistics. • Possible situations: labels are more difficult or expensive to acquire than unlabeled data. • Example: ◮ Survey sampling ◮ Electronic health record ◮ Imaging classification ◮ ... Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 4

  5. Introduction An “Assumption Lean” Framework • Assume Y is label, X = ( X 1 , . . . , X p ) is p -dimensional covariate, ( Y , X 1 , . . . , X p ) ∼ P = P ( dy , dx 1 , . . . , dx p ) . No specific assumption on the relationship between Y and X . • Observations: → n “labeled” samples from joint distribution P , � n � [ Y , X ] = Y k , X k 1 , . . . , X kp k = 1 ; → m “unlabeled” samples from marginal distribution P X , � n + m � X k 1 , . . . , X kp k = n + 1 . X add = • Goal: statistical inference for θ = E Y . Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 5

  6. Introduction Motivations • Consensus of Homeless Y X ¡(p=7) 244 Pre-­‑selected ¡Labeled ¡Samples n=265 Random ¡Labeled ¡Samples Random ¡Unlabeled ¡Samples m=1545 • Electronic Health Records: prevalence of certain disease Picture source: Jensen PB, Jensen LJ, and Brunak S. Nature Reviews, 2012 Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 6

  7. Methods m = ∞ : Ideal Semi-Supervised Inference • m = ∞ , infinitely many unlabeled samples. • Baseline estimator: sample mean ¯ Y . • Least square estimator: θ LS = ¯ β ⊤ (2) ( ¯ ˆ Y − ˆ X − µ ) . ◮ µ = E X is known; ◮ ¯ Y = 1 � n k = 1 Y k , ¯ X = 1 � n k = 1 X k ; n n � − 1 � � ⊤ � ⊤ Y is the least square estimator, ˆ ◮ ˆ � β = [ˆ β 1 ˆ β ⊤ (2) ] ⊤ ; β = X X X  1 X 11 · · · X 1 p      . . . �    . . .  X =    . . .           1 X n 1 · · · X np  is the prediction matrix with intercepts; Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 7

  8. Methods m < ∞ : Ordinary Semi-Supervised Inference • m < ∞ : finitely many unlabeled samples; P X is partially known. • Semi-supervised least squared estimator n + m 1 � θ SSLS = ¯ ˆ Y − ˆ β ⊤ (2) ( ¯ X − ˆ µ ) , µ = ˆ X k . n + m k = 1 • When m = 0 , i.e., no unlabeled samples, θ SSLS = ¯ ˆ Y ; When m = ∞ , i.e., infinitely many unlabeled samples, θ SSLS = ˆ ˆ θ LS . Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 8

  9. Methods Interpretation: An Assumption-Lean Framework Define • population slopes: β = argmin γ E ( Y − � X ⊤ γ ) 2 ; X , τ 2 = E δ 2 . • linear deviations δ = Y − β ⊤ � Picture source: Buja, Berk, Brown, George, Pitkin, Traskin, Zhao, and Zhang, Statistical Science, 2017. Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 9

  10. Methods Interpretation: An Assumption-Lean Framework • Facts: β 1 + µ ⊤ ˆ µ ⊤ ˆ θ = β 1 + µ ⊤ β (2) , θ LS = ˆ ˆ θ SSLS = ˆ ˆ β (2) , β 1 + ˆ β (2) . • Thus, ˆ θ LS and ˆ θ SSLS can be seen as “plug-in” estimators: n � E ( Y − � ( Y k − � X ⊤ γ ) 2 , ˆ X k γ ) 2 , β = argmin β = argmin γ γ k = 1 n + m 1 � µ = E X , µ = X k . ˆ n + m k = 1 Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 10

  11. Theoretical Properties Theory: ℓ 2 risks • Recall ◮ population slopes β = argmin γ E ( Y − � X ⊤ γ ) 2 , β = [ β 1 β ⊤ (2) ] ⊤ ; ◮ Linear deviations δ = Y − β ⊤ � X ; ◮ τ 2 = E δ 2 , µ = E X , Σ = Cov ( X ) . Proposition ( ℓ 2 risk of ¯ Y ) Y − θ ) 2 = τ 2 + β ⊤ n E ( ¯ (2) Σ β (2) . Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 11

  12. Theoretical Properties Theory: ℓ 2 risks Theorem ( ℓ 2 risk of ˆ θ LS ) Suppose we observe n labeled samples and know P X , p = o ( n 1 / 2 ) , ˆ θ 1 LS is a truncation version of ˆ θ LS . Under finite moment conditions, we have � 2 = τ 2 + s n , � ˆ θ 1 s n = O ( p 2 / n ) . LS − θ n E Theorem ( ℓ 2 risk of ˆ θ SSLS ) Suppose we observe n labeled samples { Y k , X k } n k = 1 and m unlabeled k = n + 1 , p = o ( n 1 / 2 ) , ˆ SSLS is a truncation version of ˆ samples { X k } n + m θ 1 θ SSLS . Under finite moment conditions, we have � 2 = τ 2 + n � ˆ θ 1 n + m β ⊤ s n , m = O ( p 2 / n ) . SSLS − θ (2) Σ β (2) + s n , m , n E Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 12

  13. Theoretical Properties Remark: ℓ 2 Risk Theory � ¯ � 2 = τ 2 + β ⊤ Y − θ (2) Σ β (2) , n E � 2 = τ 2 + O ( p 2 / n ) , � ˆ θ 1 LS − θ n E � 2 = τ 2 + n � ˆ θ 1 n + m β ⊤ (2) Σ β (2) + O ( p 2 / n ) . n E SSLS − θ Remark • � 2 ≈ n m � 2 . Y − θ ) 2 + � ˆ � ˆ θ 1 n + m E ( ¯ θ 1 SSLS − θ LS − θ E n + m E SSLS are asymptotically better than ¯ • ˆ θ 1 LS , ˆ θ 1 Y in ℓ 2 risk, if β ⊤ (2) Σ β (2) > 0 , i.e., E ( Y | X ) is significantly correlated with X . Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 13

  14. Theory Asymptotic Distribution of ˆ θ LS Theorem (Fixed p growing n asymptotics of ˆ θ LS ) Assume ( Y , X ) ∼ P . P is fixed, has finite and non-degenerate second moments, τ 2 > 0 . Based on n labeled samples, we have ˆ θ LS − θ d d MSE /τ 2 → N (0 , 1) , → 1 as n → ∞ , τ/ √ n i = 1 ( Y i − � � n X ⊤ i ˆ β ) 2 τ 2 = E ( Y − � X ⊤ β ) 2 . , where MSE : = n − p − 1 ˆ θ LS − θ • Essen-Berry-type CLT: let the cdf of τ/ √ n be F n , → | F n ( x ) − Φ ( x ) | ≤ Cn − 1 / 4 ; • Under p = p n = o ( √ n ) and other moment conditions, → asymptotic results still hold. Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 14

  15. Theory Asymptotic Distribution of ˆ θ SSLS Theorem (Fixed p growing n Asymptotics of ˆ θ SSLS ) Assume ( Y , X ) ∼ P , P is fixed, P has finite and non-degenerate second moments, τ 2 > 0 . Based on n labeled samples and m unlabeled samples, ˆ θ SSLS − θ d d ν/ν 2 → N (0 , 1) , ˆ → 1 , as n → ∞ , ν/ √ n m n n ν 2 = τ 2 + σ 2 n + m β ⊤ where ν = ˆ m + nMSE + m + n ˆ Y , (2) Σ β (2) , n n 1 1 � � ( Y i − � X ⊤ β ) 2 , σ 2 Y ) 2 . k ˆ ( Y i − ¯ MSE = ˆ Y = n − p − 1 n − 1 k = 1 k = 1 Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 15

  16. Theory Inference for θ • When p = p n = o ( √ n ) , (1 − α ) -level confidence interval for θ :  �  MSE    ˆ (Ideal semi-supervised)  θ LS ± z 1 − α/ 2  ,        n �   m n σ 2 m + n MSE + m + n ˆ     Y  ˆ  θ SSLS ± z 1 − α/ 2  . (Ordinary semi-supervised)       n      • Traditional z -interval, � �   σ 2 σ 2 ˆ ˆ     Y Y ¯ n , ¯   Y − z 1 − α/ 2 Y + z 1 − α/ 2       n       • Since d d → τ 2 + β ⊤ → τ 2 σ 2 MSE < ˆ (2) Σ β (2) . Y LS-confidence intervals are asymptotically shorter! Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 16

  17. Semiparametric Efficient Estimator Further Improvement • ˆ θ LS , ˆ θ SSLS explore linear relationship between Y and X . • Further improvement: add non-linear covariates � � X • X k 1 , . . . , X kp , g 1 ( X k ) , . . . , g q ( X k ) . k = Semi-supervised least squared estimator: • � − 1 ( � X • − µ • ) , • ) ⊤ � • ) ⊤ Y . β • = � ( � θ • ˆ LS = ¯ Y − (ˆ β • (2) ) ⊤ ( ¯ ˆ X X X n + m 1 X • − ˆ µ • = � θ • ˆ SSLS = ¯ Y − (ˆ β • (2) ) ⊤ ( ¯ µ • ) , � X • k . ˆ n + m k = 1 • Let q grows slowly ( q = o ( n 1 / 2 ) ), one can establish semiparametric efficiency and oracle optimality for ˆ LS and ˆ θ • θ • SSLS . Anru Zhang (UW-Madison) Semi-Supervised Inference Nov 30, 2018 17

Recommend


More recommend