An Alphabet-Size Bound for the Information Bottleneck Function ISIT - PowerPoint PPT Presentation

An Alphabet-Size Bound for the Information Bottleneck Function ISIT 2020 Christoph Hirche , Andreas Winter

What for? DNNs video processing clustering C. Hirche – IBM bounds 2/16

Sufficient Statistics Sufficient statistics are maps or partitions of X , S ( X ) , that capture all the information that X has on Y . Namely, I ( S ( X ); Y ) = I ( X ; Y ) . C. Hirche – IBM bounds 3/16

Sufficient Statistics Sufficient statistics are maps or partitions of X , S ( X ) , that capture all the information that X has on Y . Namely, I ( S ( X ); Y ) = I ( X ; Y ) . Minimal sufficient statistics, T(X), are the simplest sufficient statistics. T ( X ) = arg min I ( S ( X ); X ) . S ( X ): I ( S ( X ); Y )= I ( X ; Y ) C. Hirche – IBM bounds 3/16

Sufficient Statistics Sufficient statistics are maps or partitions of X , S ( X ) , that capture all the information that X has on Y . Namely, I ( S ( X ); Y ) = I ( X ; Y ) . Minimal sufficient statistics, T(X), are the simplest sufficient statistics. T ( X ) = arg min I ( S ( X ); X ) . S ( X ): I ( S ( X ); Y )= I ( X ; Y ) Approximate minimal sufficient statistics ⇔ Information Bottleneck S ( X ): I ( S ( X ); Y ) ≥ a I ( S ( X ); X ) min C. Hirche – IBM bounds 3/16

Application in ML C. Hirche – IBM bounds 4/16

IB optimality? From Schwartz-Ziv, Tishby : The DNN layers converge to fixed-points of the IB equations. C. Hirche – IBM bounds 5/16

Dimension Bounds Generally known: | W | ≤ | X | + 1 . C. Hirche – IBM bounds 6/16

Dimension Bounds Generally known: | W | ≤ | X | + 1 . But can we get bounds in terms of | Y | ? C. Hirche – IBM bounds 6/16

Dimension Bounds Generally known: | W | ≤ | X | + 1 . But can we get bounds in terms of | Y | ? Maybe approximate? I XY ( R , N ) ≤ I XY ( R ) ≤ I XY ( R , N ) + δ ( ǫ, | Y | ) for some δ ( ǫ, | Y | ) and | W | ≤ N ( ǫ, | Y | ) . C. Hirche – IBM bounds 6/16

Recoverability Lemma Given a joint distribution P XY of two random variables X and Y, and assuming that there exist N probability distributions Q 1 , . . . , Q N on Y , and a function f : X − → [ N ] with the property that 1 ∀ x 2 � P Y | X = x − Q f ( x ) � 1 ≤ ǫ, for some ǫ > 0 . Then there exists a recovery channel S : [ N ] − → X such that the Markov chain Y − X − X ′ − � X defined by X ′ = f ( X ) and P � X | X ′ = S satisfies XY � 1 ≤ ǫ ′ = 2 ǫ . X and 1 P X = P � 2 � P XY − P � C. Hirche – IBM bounds 7/16

Bounds on N ? How large does N need to be? C. Hirche – IBM bounds 8/16

Bounds on N ? How large does N need to be? Easy: N ≤ | X | , but that’s still too big. C. Hirche – IBM bounds 8/16

Bounds on N ? How large does N need to be? Easy: N ≤ | X | , but that’s still too big. In the worst case, we need to choose an ǫ -net of the probability simplex P ( Y ) of all probability distributions on Y with respect to the total variational distance, which results in � 2 � | Y | N ≤ . ǫ Generally, one can do much better (e.g. for deterministic data sets). C. Hirche – IBM bounds 8/16

IBM Bound Lemma Let Y − X − � X be a Markov chain. Then the IB function of P XY dominates the IB function of P � XY pointwise: I XY ( R ) ≥ I � XY ( R ) ∀ R . C. Hirche – IBM bounds 9/16

Alphabet-Size bounds Corollary Under the assumptions of our main lemma, I X ′ Y ( R ) ≤ I XY ( R ) ≤ I X ′ Y ( R ) + δ ( ǫ, | Y | ) , � � where δ ( ǫ, | Y | ) := ǫ ′ log | Y | + ( 1 + ǫ ′ ) h ǫ ′ . 1 + ǫ ′ Corollary Under the assumptions of our main lemma, I XY ( R , N ) ≤ I XY ( R ) ≤ I XY ( R , N ) + δ ( ǫ, | Y | ) , � � � 2 � | Y | . where δ ( ǫ, | Y | ) := ǫ ′ log | Y | + ( 1 + ǫ ′ ) h ǫ ′ and N ≤ 1 + ǫ ′ ǫ C. Hirche – IBM bounds 10/16

Quantum IB C. Hirche – IBM bounds 11/16

QIB For a quantum state ρ XY , we define R q ( a ) = inf I ( YR ; W ) σ N X → W I ( Y ; W ) σ ≥ a with, σ WYR := ( N X → W ⊗ id YR )Ψ XYR and Ψ XYR a purification of ρ XY . C. Hirche – IBM bounds 12/16

QIB Lemma For X and Y quantum, and W classical, an optimal solution for the quantum information bottleneck can be achieved with | W | ≤ | Y | 2 | R | 2 + 1 . Lemma For Y quantum, but X and W classical, an optimal solution for the quantum information bottleneck can be achieved with | W | ≤ | X | + 1 . C. Hirche – IBM bounds 13/16

QIB Lemma Given a classical-quantum state � p ( x ) | x �� x | ⊗ ρ x ρ XY = Y , (1) x and assume that there exist N quantum states σ 1 Y , . . . , σ N Y and a function f : X − → [ N ] with the property that 1 Y − σ f ( x ) 2 � ρ x ∀ x � 1 ≤ ǫ, (2) Y for given ǫ > 0 . Then there exists a recovery channel S : [ N ] − → X such that the Markov chain Y − X − X ′ − � X defined by X ′ = f ( X ) and P � X | X ′ = S satisfies XY � 1 ≤ ǫ ′ = 2 ǫ . X and 1 P X = P � 2 � ρ XY − ρ � C. Hirche – IBM bounds 14/16

QIB For Y quantum, but X and W classical: Corollary Under the assumptions of the previous lemma, I cq X ′ Y ( R ) ≤ I cq XY ( R ) ≤ I cq X ′ Y ( R ) + δ ( ǫ, | Y | ) , � � where δ ( ǫ, | Y | ) := ǫ ′ log | Y | + ( 1 + ǫ ′ ) h ǫ ′ . 1 + ǫ ′ Corollary Under the assumptions of the previous lemma, I cq XY ( R , N ) ≤ I cq XY ( R ) ≤ I cq XY ( R , N ) + δ ( ǫ, | Y | ) , � 3 � 2 | Y | 2 where δ ( ǫ, | Y | ) is as before and N ≤ . ǫ C. Hirche – IBM bounds 15/16

The End Summary: New approach to alphabet-size bounds via recoverability. New bounds on approximating the IB with alphabet-size limited by | Y | (instead of | X | ). Open Questions: Other applications to recoverability approach. Fully quantum case. (Stay tuned for more on this soon 1 .) Thanks!! 1 M. Christandl, CH, AW, in preparation , 2020 C. Hirche – IBM bounds 16/16

An Alphabet-Size Bound for the Information Bottleneck Function ISIT - PowerPoint PPT Presentation

An Alphabet-Size Bound for the Information Bottleneck Function ISIT 2020 Christoph Hirche , Andreas Winter What for? DNNs video processing clustering C. Hirche IBM bounds 2/16 Sufficient Statistics Sufficient statistics are maps or

Alphabet An alphabet is a set of letters . e.g., { a, b, c, . . . , z } e.g., { , , . . . ,

I to no go the Revisit and Review Sing the Alphabet Can you sing this alphabet song along

The Same is Not The Same Postcorrection of Alphabet Confusion Errors in Mixed-Alphabet OCR

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

von Neumann's bottleneck von Neumann machine One control unit that connects memory and

Consistency Analysis for Massively Inconsistent Datasets in Bound-to-Bound Data Collaboration

Rightward Bound: The Rise of Conservatism in Postwar America Rightward Bound : The Rise of

The Information Bottleneck Method Naftali Tishby, Fernando C. Pereira, William Bialek Naftali

WELCOME Students Book page 45 Answers 2 black The alphabet 3 brown 4 pink 1.02 Look at

2. Coding-Theoretic Foundations Source alphabet S Target alphabet {0, 1} Categories of

Lumber Size Lumber Size Control Control Studies Studies Lumber Size Control Lumber Size

Lab 2 discussion Last Time Debugging Its a science use experiments to refine

Introduction to Bioinformatics Biological words Recap p DNA codes information with alphabet of 4

More demanding workload Design Goals ____ More demanding workload Bottleneck: Network stack in

Wardrop Equilibria and Price of Stability in Bottleneck Games With Splittable Traffic Vladimir

On the Construction of Polar Codes for Channels with Moderate Input Alphabet Sizes Ido Tal 1 /

Twins in words M. Axenovich 1 Y. Person 2 S. Puzynina 3 1 Iowa State University, U.S.A. and

Counting Permutations & Combinations Strings Given an alphabet (a finite set) B, we can

Chapter 2: Formal Languages In this chapter, we say what symbols, strings, alphabets and

Learning Automata over Large Alphabets Oded Maler Irini Eleftheria Mens CNRS-V ERIMAG University

Circular repetition thresholds for small alphabets: Last cases of Gorbunovas Conjecture Lucas

Generalized golden ratios in ternary alphabets Marco Pedicini (Roma Tre University) in

Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology

Reachability Problems on (Partially Lossy) Queue Automata 13 th International Conference on

An Alphabet-Size Bound for the Information Bottleneck Function ISIT - PowerPoint PPT Presentation

An Alphabet-Size Bound for the Information Bottleneck Function ISIT 2020 Christoph Hirche , Andreas Winter What for? DNNs video processing clustering C. Hirche IBM bounds 2/16 Sufficient Statistics Sufficient statistics are maps or

Alphabet An alphabet is a set of letters . e.g., { a, b, c, . . . , z } e.g., { , , . . . ,

I to no go the Revisit and Review Sing the Alphabet Can you sing this alphabet song along

The Same is Not The Same Postcorrection of Alphabet Confusion Errors in Mixed-Alphabet OCR

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

von Neumann's bottleneck von Neumann machine One control unit that connects memory and

Consistency Analysis for Massively Inconsistent Datasets in Bound-to-Bound Data Collaboration

Rightward Bound: The Rise of Conservatism in Postwar America Rightward Bound : The Rise of

The Information Bottleneck Method Naftali Tishby, Fernando C. Pereira, William Bialek Naftali

WELCOME Students Book page 45 Answers 2 black The alphabet 3 brown 4 pink 1.02 Look at

2. Coding-Theoretic Foundations Source alphabet S Target alphabet {0, 1} Categories of

Lumber Size Lumber Size Control Control Studies Studies Lumber Size Control Lumber Size

Lab 2 discussion Last Time Debugging Its a science use experiments to refine

Introduction to Bioinformatics Biological words Recap p DNA codes information with alphabet of 4

More demanding workload Design Goals ____ More demanding workload Bottleneck: Network stack in

Wardrop Equilibria and Price of Stability in Bottleneck Games With Splittable Traffic Vladimir

On the Construction of Polar Codes for Channels with Moderate Input Alphabet Sizes Ido Tal 1 /

Twins in words M. Axenovich 1 Y. Person 2 S. Puzynina 3 1 Iowa State University, U.S.A. and

Counting Permutations &amp; Combinations Strings Given an alphabet (a finite set) B, we can

Chapter 2: Formal Languages In this chapter, we say what symbols, strings, alphabets and

Learning Automata over Large Alphabets Oded Maler Irini Eleftheria Mens CNRS-V ERIMAG University

Circular repetition thresholds for small alphabets: Last cases of Gorbunovas Conjecture Lucas

Generalized golden ratios in ternary alphabets Marco Pedicini (Roma Tre University) in

Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology

Reachability Problems on (Partially Lossy) Queue Automata 13 th International Conference on

Counting Permutations & Combinations Strings Given an alphabet (a finite set) B, we can