Conditional distribution variability measures for causality - PowerPoint PPT Presentation

NIPS 2013 Workshop on Causality Conditional distribution variability measures for causality detection José A. R. Fonollosa December 9, 2013

Outline • Introduction • Preprocessing • Conditional distributions similarity measures • Additional features • Model • Results • Conclusions

Introduction • Heterogeneous Cause-effect pairs • Statistical / Machine learning approach (3 classes) • Standard features • Measures of the similarity of ‘shape’ of the conditional distributions • Robust estimation methods: – Limited number of samples – Noise – Quantization – Avoid overfitting • Tree-based ensemble learning model (Gradient Boosting)

Preprocessing • Mean and Variance normalization: all the features are scale and mean invariant. • Homogeneous set of features from mixed numerical/categorical data: – Discretization of numerical variables – Relabeling of categorical variables. 0.60 0.60 0.40 0.40 0.20 0.20 0.00 0.00 0 1 2 3 A B C D Arbitrary labels or numbers

Conditional distributions similarity Rationale : the conditional distribution P(Y|X=x) is expected to be simpler to describe in the causal direction. Similar: – Normalized Shape/histogram for different values of the given variable x. – Similar entropy and moments. – Similar Bayesian error probability. Related with functional causal models y = f(x) + g x (e) but f(x) is replaced by the conditional mean in an interval Independence tests are replaced by similarity measures ( Image from a presentation of Kun Zhang on functional causal models )

Additional Features (I) • Information-theoretic measures – Discrete entropy and joint entropy – Discrete conditional entropy – Discrete mutual information (+ 2 normalized versions) – Adjusted (discrete) mutual information – Gaussian divergence (Differential entropy) – Uniform divergence • Slope-based Information Geometric Causal Inference (IGCI) • Hilbert Schmidt Independence Criterion (HSIC) • Pearson R (Adapted versions)

Additional Features (II) • Number of samples and number of unique samples • Moments and mixed moments: skewness, kurtosis and mixed moments (1,2) (1,3) • Polynomial fit (order 2)

Ternary symmetric problem. Single output (+1) A is a cause of B Model schemes ( -1) B is a cause of A ( 0) Neither P a (1) Features + A single ternary P a (0) P a (1)-P a (-1) classification model. - P a (-1) P c Two binary models: a model for 1 versus -1, and a model x P i P c P i for 0 versus the rest. P s (1) Two binary models: a model + for class 1 versus the rest of ½ (P s (1)-P s (-1)) classes, and a model for -1 - versus the rest. P s (-1) Similar performance

Gradient Boosting Model (GBM) • Gradient boosting – Large number of boosting stages = 500 – Large tree size = 9 (higher-order interaction)

Results Features Score Baseline(21) 0.742 Baseline(21) + Moment31(2) 0.750 Baseline(21) + Moment21(2) 0.757 Baseline(21) + Error probability(2) 0.749 Baseline(21) + Polyfit (2) 0.757 Baseline(21) + Polyfit error(2) 0.757 Baseline(21) + Skewness(2) 0.754 Baseline(21) + Kurtosis(2) 0.744 Baseline(21) + the above statistics set (14) 0.790 Baseline(21) + Standard deviation of conditional distributions(2) 0.779 Baseline(21) + Standard deviation of the skewness of conditional distributions(2) 0.765 Baseline(21) + Standard deviation of the kurtosis of conditional distributions(2) 0.759 Baseline(21) + Standard deviation of the entropy of conditional distributions(2) 0.759 Baseline(21) + Measures of variability of the conditional distribution(8) 0.789 Full set(43 features) 0.820 Training time: 45 minutes (4-core server) Test predictions: 12 minutes

Conclusions • A statistical machine learning approach to deal with heterogeneous causal-effect pairs • We need to combine several features to obtain good results. (higher-order interaction) • The proposed measures of the similarity of the conditional distributions provide significant additional performance. • Competitive results, open source code, simple and fast. • Next step: detailed study of the performance in different type of data pairs.

Conditional distribution variability measures for causality - PowerPoint PPT Presentation

NIPS 2013 Workshop on Causality Conditional distribution variability measures for causality detection Jos A. R. Fonollosa December 9, 2013 Outline Introduction Preprocessing Conditional distributions similarity measures

VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF

Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem

Outline Outline Conditional Distribution and Density Conditional Distribution and

Chapter 4: Variability Variability Provides a quantitative measure of the degree to which

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Variability Extraction and Analysis Toolkit (VEXA) VEXA Introduction The Variability Extraction

Climate Variability in South Asia V. Niranjan, M. Dinesh Kumar, and Nitin Bassi Institute for

Introduction Variability in Data Summarizing variability in a data set CS 239

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Conditional Statements Python Conditional Statements Sometimes a statement (or a block of

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Conditional Sentences as Conditional Speech Acts Workshop Questioning Speech Acts Universitt

Conditional Probability & Independence Conditional Probabilities Question : How should we

P( ) 1 conditional probability where P(F) > 0 Conditional probability of E given F:

Conditional Probability Estimation Marco Cattaneo School of Mathematics and Physical Sciences

18.175: Lecture 26 More on martingales Scott Sheffield MIT 18.175 Lecture 26 1 Outline Conditional

Lecture 7. Conditional Distributions with Applications Igor Rychlik Chalmers Department of

Bivariate and conditional distributions Edwin Leuven Today Today we will continue our study of

Draft 1 On a Generalized Splitting Method for Sampling From a Conditional Distribution Pierre

Distribution regression made easy Philippe Van Kerm Luxembourg Institute of Socio-Economic

Stein Variational Newton & other Sampling-Based Inference Methods Robert Scheichl

Lecture 5: Probability Distributions Random Variables Probability Distributions

Conditional distribution variability measures for causality - PowerPoint PPT Presentation

NIPS 2013 Workshop on Causality Conditional distribution variability measures for causality detection Jos A. R. Fonollosa December 9, 2013 Outline Introduction Preprocessing Conditional distributions similarity measures

VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF

Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem

Outline Outline Conditional Distribution and Density Conditional Distribution and

Chapter 4: Variability Variability Provides a quantitative measure of the degree to which

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Variability Extraction and Analysis Toolkit (VEXA) VEXA Introduction The Variability Extraction

Climate Variability in South Asia V. Niranjan, M. Dinesh Kumar, and Nitin Bassi Institute for

Introduction Variability in Data Summarizing variability in a data set CS 239

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Conditional Statements Python Conditional Statements Sometimes a statement (or a block of

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Conditional Sentences as Conditional Speech Acts Workshop Questioning Speech Acts Universitt

Conditional Probability &amp; Independence Conditional Probabilities Question : How should we

P( ) 1 conditional probability where P(F) &gt; 0 Conditional probability of E given F:

Conditional Probability Estimation Marco Cattaneo School of Mathematics and Physical Sciences

18.175: Lecture 26 More on martingales Scott Sheffield MIT 18.175 Lecture 26 1 Outline Conditional

Lecture 7. Conditional Distributions with Applications Igor Rychlik Chalmers Department of

Bivariate and conditional distributions Edwin Leuven Today Today we will continue our study of

Draft 1 On a Generalized Splitting Method for Sampling From a Conditional Distribution Pierre

Distribution regression made easy Philippe Van Kerm Luxembourg Institute of Socio-Economic

Stein Variational Newton &amp; other Sampling-Based Inference Methods Robert Scheichl

Lecture 5: Probability Distributions Random Variables Probability Distributions

Conditional Probability & Independence Conditional Probabilities Question : How should we

P( ) 1 conditional probability where P(F) > 0 Conditional probability of E given F:

Stein Variational Newton & other Sampling-Based Inference Methods Robert Scheichl