from statistical transportability to estimating the
play

From Statistical Transportability to Estimating the Effect of - PowerPoint PPT Presentation

From Statistical Transportability to Estimating the Effect of Stochastic Interventions Juan D. Correa and Elias Bareinboim {j.d.correa, eliasb}@columbia.edu 1 Generalization Challenges 2 Generalization Challenges One of the main


  1. From Statistical Transportability to Estimating the Effect of Stochastic Interventions Juan D. Correa and Elias Bareinboim {j.d.correa, eliasb}@columbia.edu � 1

  2. Generalization Challenges � 2

  3. Generalization Challenges • One of the main tasks in ML is to learn/train models of an underlying process using data generated by the same process. � 2

  4. Generalization Challenges • One of the main tasks in ML is to learn/train models of an underlying process using data generated by the same process. • In fact, whenever enough data is provided, several approaches are currently capable of learning very accurately the underlying distribution. � 2

  5. Generalization Challenges • One of the main tasks in ML is to learn/train models of an underlying process using data generated by the same process. • In fact, whenever enough data is provided, several approaches are currently capable of learning very accurately the underlying distribution. • In practice, however, the environment in which the data is collected is almost never the same as the one where the model is intended to be used, and will be deployed. � 2

  6. Generalization Challenges • One of the main tasks in ML is to learn/train models of an underlying process using data generated by the same process. • In fact, whenever enough data is provided, several approaches are currently capable of learning very accurately the underlying distribution. • In practice, however, the environment in which the data is collected is almost never the same as the one where the model is intended to be used, and will be deployed. • Under these constraints, the performance of the model depends on the underlying, structural similarities between training and target environments. � 2

  7. Statistical Transportability � 3

  8. Statistical Transportability Current Website ( 𝚸 ) 
 (training environment) (age) W X Y (bought) (type of ad) � 3

  9. Statistical Transportability Current Website ( 𝚸 ) 
 (training environment) (age) W Generalization X Y (bought) (type of ad) � 3

  10. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) W W Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) � 3

  11. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) We use to 
 W W represent di ff erences 
 in mechanism or 
 distribution Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) � 3

  12. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) We use to 
 W W represent di ff erences 
 in mechanism or 
 distribution Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) P(W) ≠ P*(W) hence P(y | x) ≠ P*(y | x) � 3

  13. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) W W Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) � 4

  14. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) W W Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) • How to generalize the model learned in the source environment to di ff erent (but related) target environments? � 4

  15. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) W W Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) • How to generalize the model learned in the source environment to di ff erent (but related) target environments? • Do we need to obtain samples from 𝚸 * and train a new model? � 4

  16. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) W W Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) � 5

  17. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) W W Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) We observe P(x,y,w) � 5

  18. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) W W Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) We observe P(x,y,w) We want to say something about P * (y|x) � 5

  19. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) W W Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) We observe P(x,y,w) We want to say something about P * (y|x) P(x,y,w)=P(w) P(x|w) P(y|x,w) � 5

  20. Statistical Transportability Current Website ( 𝚸 ) 
 New Website ( 𝚸 *) 
 (training environment) (target environment) (age) (age) W W Generalization X Y X Y (bought) (bought) (type of ad) (type of ad) We observe P(x,y,w) We want to say something about P * (y|x) P(x,y,w)=P(w) P(x|w) P(y|x,w) are the same in both environments, 
 which is implied by this causal model. � 5

  21. Statistical Transportability New Website ( 𝚸 *) 
 (target environment) (age) W X Y (bought) (type of ad) � 6

  22. Statistical Transportability • The target distribution P * (y|x) can be expressed as: New Website ( 𝚸 *) 
 (target environment) (age) W X Y (bought) (type of ad) � 6

  23. Statistical Transportability • The target distribution P * (y|x) can be expressed as: New Website ( 𝚸 *) 
 (target environment) ∑ w P *( y | x , w ) P *( x | w ) P *( w ) (age) P *( y | x ) = P *( y , x ) = W ∑ w P *( x | w ) P *( w ) P *( x ) X Y (bought) (type of ad) � 6

  24. Statistical Transportability • The target distribution P * (y|x) can be expressed as: New Website ( 𝚸 *) 
 (target environment) are the same in source and target ∑ w P *( y | x , w ) P *( x | w ) P *( w ) (age) P *( y | x ) = P *( y , x ) = W ∑ w P *( x | w ) P *( w ) P *( x ) X Y (bought) (type of ad) � 6

  25. Statistical Transportability • The target distribution P * (y|x) can be expressed as: New Website ( 𝚸 *) 
 (target environment) ∑ w P *( y | x , w ) P *( x | w ) P *( w ) (age) P *( y | x ) = P *( y , x ) = W ∑ w P *( x | w ) P *( w ) P *( x ) X Y (bought) (type of ad) � 6

  26. Statistical Transportability • The target distribution P * (y|x) can be expressed as: New Website ( 𝚸 *) 
 (target environment) ∑ w P *( y | x , w ) P *( x | w ) P *( w ) (age) P *( y | x ) = P *( y , x ) = W ∑ w P *( x | w ) P *( w ) P *( x ) ∑ w P ( y | x , w ) P ( x | w ) P *( w ) = ∑ w P ( x | w ) P *( w ) X Y (bought) (type of ad) � 6

  27. Statistical Transportability • The target distribution P * (y|x) can be expressed as: New Website ( 𝚸 *) 
 (target environment) ∑ w P *( y | x , w ) P *( x | w ) P *( w ) (age) P *( y | x ) = P *( y , x ) = W ∑ w P *( x | w ) P *( w ) P *( x ) ∑ w P ( y | x , w ) P ( x | w ) P *( w ) = ∑ w P ( x | w ) P *( w ) X Y (bought) (type of ad) • Under the assumptions implied by the diagram, only P*(w) needs to be measured in the target environment, while the other distributions can be reused from the data collected in the source environment. � 6

  28. Deciding Transportability � 7

  29. Deciding Transportability Source ( 𝚸 ) Target ( 𝚸 *) Selection Diagram D � 7

  30. Deciding Transportability Source ( 𝚸 ) Target ( 𝚸 *) Selection Diagram D P ( v ) Distribution learned 
 from 𝛒 � 7

  31. Deciding Transportability Source ( 𝚸 ) Target ( 𝚸 *) Selection Diagram D P ( v ) P *( w ) Distribution learned 
 from 𝛒 Partial distribution 
 from 𝛒 * � 7

  32. Deciding Transportability Source ( 𝚸 ) Target ( 𝚸 *) Selection Diagram D Is there a function f such that 
 ? P ( v ) P *( y | x ) = f ( P ( v ), P *( w )) P *( w ) Distribution learned 
 from 𝛒 Partial distribution 
 from 𝛒 * � 7

  33. Deciding Transportability Source ( 𝚸 ) Target ( 𝚸 *) Selection Diagram D Is there a function f such that 
 yes ( ) / no f ? P ( v ) P *( y | x ) = f ( P ( v ), P *( w )) 😁 ☹ P *( w ) Distribution learned 
 from 𝛒 Partial distribution 
 from 𝛒 * � 7

  34. Proposed Strategy � 8

  35. Proposed Strategy Encode the assumptions about the di ff erences 1 and commonalities across environments. � 8

  36. Proposed Strategy Encode the assumptions about the di ff erences 1 Selection diagrams (with ) and commonalities across environments. � 8

  37. Proposed Strategy Encode the assumptions about the di ff erences 1 Selection diagrams (with ) and commonalities across environments. Identify the stable mechanisms across 2 environments. � 8

  38. Proposed Strategy Encode the assumptions about the di ff erences 1 Selection diagrams (with ) and commonalities across environments. Identify the stable mechanisms across 2 environments. Determine the variables that need to be re- 3 measured. � 8

Recommend


More recommend