https://gilleskratzer.netlify.com/ http://www.r-bayesian-networks.org/ GILLES KRATZER, APPLIED STATISTICS GROUP, UZH CAUSALITY WORKSHOP, UZH 14.12.2018 BAYESIAN NETWORKS MEET OBSERVATIONAL DATA gilles.kratzer@math.uzh.ch
MOTIVATIONAL EXAMPLE: CREDIT CARD FRAUD DETECTION PREDICTION
MOTIVATIONAL EXAMPLE: CREDIT CARD FRAUD DETECTION PREDICTION
MOTIVATIONAL EXAMPLE: VETERINARY EPIDEMIOLOGY DATA VISUALISATION
MOTIVATIONAL EXAMPLE: SOCIAL SCIENCES DATA INTERPRETATION
BAYESIAN NETWORKS IN THE MACHINE LEARNING WORLD
OUTLINE OF THE TALK Objectif of the talk: How to learn Bayesian networks from observational data?
<latexit sha1_base64="Q5Hpx5WiI8bpSbx7+lIjwVGktpw=">ACbXicbVFNi9RAEO3Er3X8ioHP5DCQZwFGRIR9LKw6MVjhJ3dgckYOp3Kpmc7ndBdUYeQm7/Qm3/Bi3/BzuwsuLsWNDze9XV9TprlLQUhr8/8rVa9dv7Nwc3bp95+694P6DQ1u3RuBM1Ko284xbVFLjCQpnDcGeZUpPMpOPg760Vc0Vtb6gNYNLit+rGUhBSdHpcGPeJUnMqs6Ob97h4kjanztFvtRf2XTvfxZJ6ukrmcOaKeZ+uXkNyUCLxdLXrIOF36uBbiQahP28E6UFKhEsEtQFNygJjvAHtyANBiH03BTcBlEWzBm24rT4GeS16Kt3C1CcWsXUdjQsuOGpFDYj5LWYsPFCT/GhYOaV2iX3SatHl46JoeiNu5og37b0fHK2vXVeacwyr2ojaQ/9MWLRXvl53UTUuoxemgolVANQzRQy4NClJrB7gw0r0VRMkNF+Q+aORCiC6ufBkcvplG4T6/Ha8/2Ebxw57yl6wCYvYO7bPrGYzZhgv73Ae+w98f74j/xn/vNTq+9tex6yc+W/+gunZ7qj</latexit> <latexit sha1_base64="Q5Hpx5WiI8bpSbx7+lIjwVGktpw=">ACbXicbVFNi9RAEO3Er3X8ioHP5DCQZwFGRIR9LKw6MVjhJ3dgckYOp3Kpmc7ndBdUYeQm7/Qm3/Bi3/BzuwsuLsWNDze9XV9TprlLQUhr8/8rVa9dv7Nwc3bp95+694P6DQ1u3RuBM1Ko284xbVFLjCQpnDcGeZUpPMpOPg760Vc0Vtb6gNYNLit+rGUhBSdHpcGPeJUnMqs6Ob97h4kjanztFvtRf2XTvfxZJ6ukrmcOaKeZ+uXkNyUCLxdLXrIOF36uBbiQahP28E6UFKhEsEtQFNygJjvAHtyANBiH03BTcBlEWzBm24rT4GeS16Kt3C1CcWsXUdjQsuOGpFDYj5LWYsPFCT/GhYOaV2iX3SatHl46JoeiNu5og37b0fHK2vXVeacwyr2ojaQ/9MWLRXvl53UTUuoxemgolVANQzRQy4NClJrB7gw0r0VRMkNF+Q+aORCiC6ufBkcvplG4T6/Ha8/2Ebxw57yl6wCYvYO7bPrGYzZhgv73Ae+w98f74j/xn/vNTq+9tex6yc+W/+gunZ7qj</latexit> <latexit sha1_base64="Q5Hpx5WiI8bpSbx7+lIjwVGktpw=">ACbXicbVFNi9RAEO3Er3X8ioHP5DCQZwFGRIR9LKw6MVjhJ3dgckYOp3Kpmc7ndBdUYeQm7/Qm3/Bi3/BzuwsuLsWNDze9XV9TprlLQUhr8/8rVa9dv7Nwc3bp95+694P6DQ1u3RuBM1Ko284xbVFLjCQpnDcGeZUpPMpOPg760Vc0Vtb6gNYNLit+rGUhBSdHpcGPeJUnMqs6Ob97h4kjanztFvtRf2XTvfxZJ6ukrmcOaKeZ+uXkNyUCLxdLXrIOF36uBbiQahP28E6UFKhEsEtQFNygJjvAHtyANBiH03BTcBlEWzBm24rT4GeS16Kt3C1CcWsXUdjQsuOGpFDYj5LWYsPFCT/GhYOaV2iX3SatHl46JoeiNu5og37b0fHK2vXVeacwyr2ojaQ/9MWLRXvl53UTUuoxemgolVANQzRQy4NClJrB7gw0r0VRMkNF+Q+aORCiC6ufBkcvplG4T6/Ha8/2Ebxw57yl6wCYvYO7bPrGYzZhgv73Ae+w98f74j/xn/vNTq+9tex6yc+W/+gunZ7qj</latexit> <latexit sha1_base64="Q5Hpx5WiI8bpSbx7+lIjwVGktpw=">ACbXicbVFNi9RAEO3Er3X8ioHP5DCQZwFGRIR9LKw6MVjhJ3dgckYOp3Kpmc7ndBdUYeQm7/Qm3/Bi3/BzuwsuLsWNDze9XV9TprlLQUhr8/8rVa9dv7Nwc3bp95+694P6DQ1u3RuBM1Ko284xbVFLjCQpnDcGeZUpPMpOPg760Vc0Vtb6gNYNLit+rGUhBSdHpcGPeJUnMqs6Ob97h4kjanztFvtRf2XTvfxZJ6ukrmcOaKeZ+uXkNyUCLxdLXrIOF36uBbiQahP28E6UFKhEsEtQFNygJjvAHtyANBiH03BTcBlEWzBm24rT4GeS16Kt3C1CcWsXUdjQsuOGpFDYj5LWYsPFCT/GhYOaV2iX3SatHl46JoeiNu5og37b0fHK2vXVeacwyr2ojaQ/9MWLRXvl53UTUuoxemgolVANQzRQy4NClJrB7gw0r0VRMkNF+Q+aORCiC6ufBkcvplG4T6/Ha8/2Ebxw57yl6wCYvYO7bPrGYzZhgv73Ae+w98f74j/xn/vNTq+9tex6yc+W/+gunZ7qj</latexit> OUTLINE OF THE TALK Objectif of the talk: select How to learn Bayesian networks from observational data? Bayesian Networks are defined by two elements: Network structure: Directed Acyclic Graph (DAG): G = (V, A) in which each node vi ∈ V corresponds to a random variable Xi Probability distribution: Probability distribution X with parameters Θ , which can be factorised into smaller local probability distributions according to the arcs aij ∈ A present in the graph. A BN encodes the factorisation of the joint distribution n Y P ( X ) = P ( X j | Pa j , Θ j ) , where Pa j is the set of parents of X j j =1
OUTLINE OF THE TALK Objectif of the talk: How to learn Bayesian networks from observational data? Which approaches do exist? Which assumptions/limitations are involved when learning a Bayesian network form observational dataset? Theoretical limitations: ‣ BN learning is ill-posed on two levels ‣ Finite sample (any stats problem is ill-posed) ‣ Complete knowledge of observational distribution usually does not determine the underlying causal model
OUTLINE OF THE TALK Objectif of the talk: How to learn Bayesian networks from observational data? Which approaches do exist? Which assumptions/limitations are involved when learning a Bayesian network form observational dataset? Technical limitations: ‣ Approximate learning process ‣ Proxies ‣ Combinatorial wall!!! ‣ Simplification needed
COMBINATORIAL WALL Typical domain of interest # Nodes # DAGs Inference Exact inference 1 - 15 Nodes < 10 41 DAGs EPIDEMIOLOGY 16 - 25 Nodes < 10 100 DAGs Exact inference possible 26 - 50 Nodes < 10 400 DAGs Approximate inference GENOMICS PROTEOMICS 51 - 100 Nodes < 10 1700 DAGs Approximate inference 101 - 1000 Nodes < 10 100000 DAGs (very) approximative inference Approximations: ‣ limiting number of parents per node ‣ Decomposable scores/efficient algorithm ‣ Score equivalence
<latexit sha1_base64="RQnfYWTu5qOROz3pPyOd/SKOE=">ACGXicbVBPS8MwHE3nv1n/VT16CQ5hgzFaEfQiDL14nOC2wlpKmqVbNE1Lkgpj7mt48at48aCIRz35bUy3HnTzQcjvd8j+b0wZVQq2/42SkvLK6tr5XVzY3Nre8fa3evIJBOYtHCEuGSBJGOWkrqhxU0FQHDLSDe8uc797T4SkCb9Ro5T4MRpwGlGMlJYCy25V3cCpQ6+fKFmHbsBr8Bx6HjRzgz64wW1h1mZXYFXshj0FXCROQSqgQCuwPnUOZzHhCjMkZc+xU+WPkVAUMzIxvUySFOE7NCA9TmKifTH080m8EgrfRglQh+u4FT9nRijWMpRHOrJGKmhnPdy8T+vl6nozB9TnmaKcDx7KMoYVAnMa4J9KghWbKQJwoLqv0I8RAJhpcs0dQnO/MqLpHPcDS/Pqk0L4o6yuAHIqcMApaIr0AJtgMEjeAav4M14Ml6Md+NjNloyisw+APj6wcFkZ08</latexit> <latexit sha1_base64="RQnfYWTu5qOROz3pPyOd/SKOE=">ACGXicbVBPS8MwHE3nv1n/VT16CQ5hgzFaEfQiDL14nOC2wlpKmqVbNE1Lkgpj7mt48at48aCIRz35bUy3HnTzQcjvd8j+b0wZVQq2/42SkvLK6tr5XVzY3Nre8fa3evIJBOYtHCEuGSBJGOWkrqhxU0FQHDLSDe8uc797T4SkCb9Ro5T4MRpwGlGMlJYCy25V3cCpQ6+fKFmHbsBr8Bx6HjRzgz64wW1h1mZXYFXshj0FXCROQSqgQCuwPnUOZzHhCjMkZc+xU+WPkVAUMzIxvUySFOE7NCA9TmKifTH080m8EgrfRglQh+u4FT9nRijWMpRHOrJGKmhnPdy8T+vl6nozB9TnmaKcDx7KMoYVAnMa4J9KghWbKQJwoLqv0I8RAJhpcs0dQnO/MqLpHPcDS/Pqk0L4o6yuAHIqcMApaIr0AJtgMEjeAav4M14Ml6Md+NjNloyisw+APj6wcFkZ08</latexit> <latexit sha1_base64="RQnfYWTu5qOROz3pPyOd/SKOE=">ACGXicbVBPS8MwHE3nv1n/VT16CQ5hgzFaEfQiDL14nOC2wlpKmqVbNE1Lkgpj7mt48at48aCIRz35bUy3HnTzQcjvd8j+b0wZVQq2/42SkvLK6tr5XVzY3Nre8fa3evIJBOYtHCEuGSBJGOWkrqhxU0FQHDLSDe8uc797T4SkCb9Ro5T4MRpwGlGMlJYCy25V3cCpQ6+fKFmHbsBr8Bx6HjRzgz64wW1h1mZXYFXshj0FXCROQSqgQCuwPnUOZzHhCjMkZc+xU+WPkVAUMzIxvUySFOE7NCA9TmKifTH080m8EgrfRglQh+u4FT9nRijWMpRHOrJGKmhnPdy8T+vl6nozB9TnmaKcDx7KMoYVAnMa4J9KghWbKQJwoLqv0I8RAJhpcs0dQnO/MqLpHPcDS/Pqk0L4o6yuAHIqcMApaIr0AJtgMEjeAav4M14Ml6Md+NjNloyisw+APj6wcFkZ08</latexit> <latexit sha1_base64="RQnfYWTu5qOROz3pPyOd/SKOE=">ACGXicbVBPS8MwHE3nv1n/VT16CQ5hgzFaEfQiDL14nOC2wlpKmqVbNE1Lkgpj7mt48at48aCIRz35bUy3HnTzQcjvd8j+b0wZVQq2/42SkvLK6tr5XVzY3Nre8fa3evIJBOYtHCEuGSBJGOWkrqhxU0FQHDLSDe8uc797T4SkCb9Ro5T4MRpwGlGMlJYCy25V3cCpQ6+fKFmHbsBr8Bx6HjRzgz64wW1h1mZXYFXshj0FXCROQSqgQCuwPnUOZzHhCjMkZc+xU+WPkVAUMzIxvUySFOE7NCA9TmKifTH080m8EgrfRglQh+u4FT9nRijWMpRHOrJGKmhnPdy8T+vl6nozB9TnmaKcDx7KMoYVAnMa4J9KghWbKQJwoLqv0I8RAJhpcs0dQnO/MqLpHPcDS/Pqk0L4o6yuAHIqcMApaIr0AJtgMEjeAav4M14Ml6Md+NjNloyisw+APj6wcFkZ08</latexit> PLAN 1. From observationnal dataset deduce probabilistic model EXPONENTIAL FAMILY - Usually discrete BN or jointly Gaussian - Epidemiological constrain: mixture of distributions 2. From probabilistic model deduce structure Observational dataset Probabilistic model Network structure X1 X2 X3 … P ( X 1 , . . . , X n ) = 1 2 12 23 53 … P ( X i | X j , . . . ) . . . 32 31 23 … 10 16 45 … Independance Computing directly testing … … … …
Recommend
More recommend