Provable Efficient Skeleton Learning of Encodable Discrete Bayes - PowerPoint PPT Presentation

Provable Efficient Skeleton Learning of Encodable Discrete Bayes Nets in Poly-Time and Sample Complexity ISIT 2020 Adarsh Barik, Jean Honorio Purdue University 1

What are Bayesian networks? Example: Burglar Alarm [Russel’02] “I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar?” 2

What are Bayesian networks? Example: Burglar Alarm [Russel’02] “I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar?” Systems with multiple variables and their interactions 2

What are Bayesian networks? How to model variables and their interactions? systems this way 3 • Need joint probability distribution table • John calls: J takes two values • 2 n − 1 entries for n variables { Calls , Doesn’t Call } = { 1, 0 } . • Quickly becomes too large: 2 50 ∼ 10 15 • Mary doesn’t call: M ∈ { 1, 0 } • Alarm is ringing: A ∈ { 1, 0 } • Cannot handle even moderately big • Earthquake : E ∈ { 1, 0 } • Burglar : B ∈ { 1, 0 }

What are Bayesian networks? How to model variables and their interactions? systems this way 3 • John calls: J takes two values • Need joint probability distribution table { Calls , Doesn’t Call } = { 1, 0 } . • 2 n − 1 entries for n variables • Mary doesn’t call: M ∈ { 1, 0 } • Quickly becomes too large: 2 50 ∼ 10 15 • Alarm is ringing: A ∈ { 1, 0 } • Cannot handle even moderately big • Earthquake : E ∈ { 1, 0 } • Burglar : B ∈ { 1, 0 }

What are Bayesian networks? Bayesian networks A Directed Acyclic Graph (DAG) that specifies a joint distribution over random variables as a product of conditional probability func- tions, one for each variable given its set of parents 4 B E • P ( B , E , A , J , M ) = P ( B ) P ( E ) P ( A | B , E ) P ( J | A ) P ( M | A ) A • From 2 5 − 1 = 31 entries to 1 + 1 + 2 2 + 2 + 2 = 10 entries J M

What’s the problem? We want to learn the structure of Bayesian network from data. 5 • Realization of each random variable: X 1 X 2 X 1 , X 2 , X 3 , · · · , X i , · · · , X n • Need not be ordered: X 3 X i X 3 , X n , X i , · · · , X 1 , · · · , X 2 • N i.i.d. samples X n

What’s the problem? We want to learn the structure of Bayesian network from data. Data 5 • Realization of each random variable: X 1 , X 2 , X 3 , · · · , X i , · · · , X n • Need not be ordered: X 1 X 2 X 3 , X n , X i , · · · , X 1 , · · · , X 2 • N i.i.d. samples X 3 X i n variables X n N samples

What’s the problem? Data Can we recover Bayesian network structure from data? Yes, but it is hard! 6 X 1 X 2 n variables X 3 X i → X n N samples

What’s the problem? Data Can we recover Bayesian network structure from data? Yes, but it is NP-Hard [Chickering’04]! 6 X 1 X 2 n variables X 3 X i → X n N samples

Recovering structure of Bayesian network from data Related Work Tsamardinos’06], [Koivisto’04, Jakkola’10, Silander’12, Cussens’12] complexity [Ghoshal’17a, 17b] 7 • Score maximization method - [Friedman’99, Margaritis’00, Moore’03, • Independence test based method - [Spirtes’00, Cheng’02, Yehezkel’05, Xie’08] • Special Cases with guarantees • Linear structural equation models with additive noise: poly time and sample • Node ordering for ordinal variables: poly time and sample complexity [Park’15,17] • Binary variables: poly sample complexity [Brenner’13]

Our Problem We want to learn the structure skeleton of Bayesian network from data. Data 8 X 1 X 2 n variables X 3 X i → X n N samples • Correctness • Polynomial time complexity • Polynomial sample complexity

Our Problem We want to learn the structure skeleton of Bayesian network from data. [Ordyniak’13] Possible to learn DAG from skeleton efficiently under some technical conditions 8 Data X 1 X 2 n variables X 3 X i → X n N samples • Correctness • Polynomial time complexity • Polynomial sample complexity

Encoding Variables Data only input we have for our mathematical model country name: USA, India, China (no natural order) 9 n variables • Numerical variables - marks in exam: 25 < 30 < 45 (natural order) • Categorical (Nominal) variables - N samples • Type of variables matters as it is the

Encoding Variables variables China 10 • Dummy Encoding - USA: ( 1, 0, 0 ) , • We use encoding for categorical India: ( 0, 1, 0 ) , China: ( 0, 0, 1 ) • Effects Encoding - USA: ( 1, 0 ) , India: • Variable - country name: USA, India, ( 0, 1 ) , China: (− 1, − 1 ) • X r is encoded as E ( X r ) ∈ R k

Our Method and then combine 11 • X − r = [ X 1 · · · X r − 1 X r + 1 · · · X n ] ⊺ X 1 X 2 • E ( X − r ) = [ E ( X 1 ) · · · E ( X r − 1 ) E ( X r + 1 ) · · · E ( X n )] ⊺ X 3 X i • π ( r ) is set of parents for variable r and c ( r ) is set of children for variable r X n • Recover π ( r ) and c ( r ) for every node

Our Method and then combine 11 • X − r = [ X 1 · · · X r − 1 X r + 1 · · · X n ] ⊺ X 1 X 2 • E ( X − r ) = [ E ( X 1 ) · · · E ( X r − 1 ) E ( X r + 1 ) · · · E ( X n )] ⊺ X 3 X i • π ( r ) is set of parents for variable r and c ( r ) is set of children for variable r X n • Recover π ( r ) and c ( r ) for every node Idea: Express E ( X r ) as a linear function of E ( X − r )

Our Method Substitute model in population setting . . . W 2 W 1 12 arg min 1 W • Let E ( X r ) = W ∗ ⊺ E ( X − r ) + e ( E ( X − r )) • e ( E ( X − r )) depends on X − r . Let ∥ E ( | e | ) ∥ ∞ ⩽ µ and ∥ e ∥ ∞ = 2 σ • Solve for W ∗ as 2 E ( ∥ E ( X r ) − W ⊺ E ( X − r ) ∥ 2 2 ) such that W i = 0 , ∀ i / ∈ π ( r ) ∪ c ( r )       , each W i ∈ R k × k , W ∈ R ( n − 1 ) k × k • W =     W n

Our Method Substitute model in sample setting 1 13 • Solve for � W = arg min W 2 N ∥ E ( X r ) − E ( X − r ) W ∥ 2 F + λ N ∥ W ∥ B ,1,2 • E ( X r ) ∈ R N × k , E ( X − r ) ∈ R N × ( n − 1 ) k • ∥ W ∥ B ,1,2 = ∑ i ∈ B ∥ W i ∥ F • Idea: Provide condition on N and λ N such that ∥ W i ∥ F = 0, ∀ i / ∈ π ( r ) ∪ c ( r ) and ∥ W i ∥ F ̸ = 0, ∀ i ∈ π ( r ) ∪ c ( r )

Our Method Assumptions 1. Need to have a unique solution. Mathematically, 2. Mutual Incoherence: Large number of irrelavant covariates (non parent or children 14 Let H = E ( E ( X − r ) E ( X − r ) ⊺ ) and � H = 1 N E ( X − r ) ⊺ E ( X − r ) . Λ min ( H π ( r ) ∪ c ( r ) , π ( r ) ∪ c ( r ) ) = C > 0 of node r ) should not exert an overly strong effect on the subset of relevant covariates (parent and children of node r ). Mathematically, for some α ∈ ( 0, 1 ] , ∥ H ( π ( r ) ∪ c ( r )) c π ( r ) ∪ c ( r ) H − 1 π ( r ) ∪ c ( r ) π ( r ) ∪ c ( r ) ∥ B , ∞ ,1 ⩽ 1 − α 3. ∥ A ∥ B , ∞ ,1 = max i ∈ B ∥ vec ( A i ) ∥ 1 4. Also holds in sample with high probability if N = O ( k 5 d 3 log ( kn )) where d = | π ( r ) ∪ c ( r ) |

Provable Efficient Skeleton Learning of Encodable Discrete Bayes - PowerPoint PPT Presentation

Provable Efficient Skeleton Learning of Encodable Discrete Bayes Nets in Poly-Time and Sample Complexity ISIT 2020 Adarsh Barik, Jean Honorio Purdue University 1 What are Bayesian networks? Example: Burglar Alarm [Russel02] Im at

Skeleton and Dual Complex Chenyang Xu Beijing International Center of Mathematics Research

The Skeleton Consists of bones, cartilage, joints, and ligaments Composed of 206 named

Summary of: Interactive Skeleton Summary of: Interactive Skeleton Techniques for Enhancing Motion

Hamilton-Jacobi Skeleton and Shock Graphs Peihong Zhu University of Utah Papers:

The Visibility Skeleton Frdo Durand, George Drettakis, Claude Puech Visibility Skeleton Graph

Curve-Skeleton Applications Nicu D. Cornea, Deborah Silver, Rutgers University, New Jersey

Skeleton and Skin What is skinning? Skinning is the process of creating association between

Slide 1 / 62 Slide 2 / 62 1 What are the missing coefficients for the skeleton 2 What are the

Skeleton Structures in Computational Geometry An introduction with GIS in mind Stefan Huber

Skeletons Animated characters are usually built on top of an underlying skeleton The

Straight Skeleton Implementations Computational Geometry and Applications Lab based on Exact

Another Look at Provable Security Alfred Menezes (joint work with Sanjit Chatterjee, Neal

Provable Security in Cryptography ----- DL-based Systems ECC - Sept 24th 2002 - Essen David

Provable Security against Side-Channel Attacks Matthieu Rivain matthieu.rivain@cryptoexperts.com

Outline Cryptographic Algorithm Engineering and Provable Security Crypto refresher

Group Key Exchange and Provable Security joint work with E. Bresson and O. Chevassut David

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Embedded Analytcs and Automotve Security Aileen Smith Chief Strategy Ofcer Corporate Overview

For Monday Read chapter 18, sections 1-2 Homework: Chapter 14, exercise 8 a-d Program

CS 6316 Machine Learning Review of Linear Algebra and Probability Yangfeng Ji Department of

The Modern Cyber Threat Pandemic Nate Traiser Mtn Region

Informatics 2D Reasoning and Agents Semester 2, 20192020 Alex Lascarides

Welcome to the ABB STOTZ KONTAKT training program about ABB i-bus KNX. This unit deals with basic

Features Alarms Albert L. Rossi Fermi Na2onal