Linear Models are Most Favorable among Generalized Linear Models - PowerPoint PPT Presentation

Linear Models are Most Favorable among Generalized Linear Models Kuan-Yun Lee and Thomas A. Courtade Electrical Engineering and Computer Sciences University of California, Berkeley ISIT 2020 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 1 / 20

Overview Introduction and Main Results 1 Keypoints of Proof 2 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 2 / 20

Introduction Given X := ( X 1 , . . . , X n ) ∼ f ( · ; θ ) Linear regression: X = M θ + Z Phase retrieval: X i = � m i , θ � 2 + Z i Group testing: X i = δ ( � m i , θ � ) M ⊤ � � Matrix retrieval: X i = Tr i θ when θ is a matrix . . . Many other settings with sparsity, structural assumptions on M , etc. K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 4 / 20

Introduction Given X := ( X 1 , . . . , X n ) ∼ f ( · ; θ ) Linear regression: X = M θ + Z Phase retrieval: X i = � m i , θ � 2 + Z i Group testing: X i = δ ( � m i , θ � ) M ⊤ � � Matrix retrieval: X i = Tr i θ when θ is a matrix . . . Many other settings with sparsity, structural assumptions on M , etc. Key Question How well can we estimate θ from observations X ∼ f ( · ; θ )? K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 4 / 20

✶ Introduction Consider the classical linear model X = M θ + Z under constraint θ ∈ Θ Fundamental Question θ sup θ ∈ Θ E L ( θ, ˆ Given a loss function L ( · , · ), what is inf ˆ θ )? K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 5 / 20

Introduction Consider the classical linear model X = M θ + Z under constraint θ ∈ Θ Fundamental Question θ sup θ ∈ Θ E L ( θ, ˆ Given a loss function L ( · , · ), what is inf ˆ θ )? Loss functions L ( θ, ˆ θ ): Constraints on Θ: � θ − ˆ θ � 2 2 (estimation error) Θ is L p ball � M θ − M ˆ θ � 2 2 (prediction error) Θ is a matrix space with rank constraints ✶ (supp( θ ) = supp(ˆ θ )) K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 5 / 20

Introduction In this talk, we will focus on estimation error L ( θ, ˆ θ ) := � θ − ˆ θ � 2 2 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 6 / 20

Introduction In this talk, we will focus on estimation error L ( θ, ˆ θ ) := � θ − ˆ θ � 2 2 Consider X = M θ + Z ∈ R n with fixed design matrix M ∈ R n × d , Θ = R d , Z ∼ N (0 , σ 2 · I n ). Suppose M has full column rank, then, ˆ θ MLE := ( M ⊤ M ) − 1 M ⊤ X θ sup θ ∈ R d E � θ − ˆ θ � 2 achieves the minimax error inf ˆ 2 , and 2 = σ 2 · Tr(( M ⊤ M ) − 1 ) . E � θ − ˆ θ MLE � 2 2 = E � ( M ⊤ M ) − 1 M ⊤ Z � 2 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 6 / 20

Introduction In this talk, we will focus on estimation error L ( θ, ˆ θ ) := � θ − ˆ θ � 2 2 Consider X = M θ + Z ∈ R n with fixed design matrix M ∈ R n × d , Θ = R d , Z ∼ N (0 , σ 2 · I n ). Suppose M has full column rank, then, ˆ θ MLE := ( M ⊤ M ) − 1 M ⊤ X θ sup θ ∈ R d E � θ − ˆ θ � 2 achieves the minimax error inf ˆ 2 , and 2 = σ 2 · Tr(( M ⊤ M ) − 1 ) . E � θ − ˆ θ MLE � 2 2 = E � ( M ⊤ M ) − 1 M ⊤ Z � 2 Follow up question Can we generalize this? The Gaussian distribution falls into the exponential family The linear model falls into the family of generalized linear models K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 6 / 20

Exponential Family Density of X ∈ R given natural parameter η ∈ R � η x − Φ( η ) � f ( x ; η ) = h ( x ) exp s ( σ ) h : X ⊆ R → [0 , ∞ ) (the base measure ) Φ : R → R (the cumulant function ) s ( σ ) > 0: scale parameter Examples: � � 1 : h ( x ) = 1, Φ( t ) = log(1 + e t ) and s ( σ ) = 1 Bernoulli 1+ e − η 2 π e − x 2 / 2 , Φ( t ) = t 2 / 2 and s ( σ ) = 1 1 Gaussian( η, 1): h ( x ) = √ Exponential( η ): h ( x ) = 1, Φ( t ) = − log t and s ( σ ) = 1 Poisson( e η ): h ( x ) = 1 / x !, Φ( t ) = e t and s ( σ ) = 1 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 7 / 20

Generalized Linear Models Density of X ∈ R n given parameter M θ ∈ R n n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp s ( σ ) i =1 h : X ⊆ R → [0 , ∞ ) (the base measure ) Φ : R → R (the cumulant function ) s ( σ ) > 0: scale parameter Examples: � � 1 : h ( x ) = 1, Φ( t ) = log(1 + e t ) and s ( σ ) = 1 Bernoulli 1+ e −� mi ,θ � 2 π e − x 2 / 2 , Φ( t ) = t 2 / 2 and s ( σ ) = 1 1 Gaussian( � m i , θ � , 1): h ( x ) = √ Exponential( � m i , θ � ): h ( x ) = 1, Φ( t ) = − log t and s ( σ ) = 1 Poisson( e � m i ,θ � ): h ( x ) = 1 / x !, Φ( t ) = e t and s ( σ ) = 1 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 8 / 20

Generalized Linear Models Density of X ∈ R n given parameter M θ ∈ R n n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp s ( σ ) i =1 We make one common assumption: Φ ′′ ≤ L . K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 9 / 20

Generalized Linear Models Density of X ∈ R n given parameter M θ ∈ R n n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp s ( σ ) i =1 We make one common assumption: Φ ′′ ≤ L . The variance of X i is s ( σ ) · Φ ′′ ( � m i , θ � ) K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 9 / 20

Generalized Linear Models Density of X ∈ R n given parameter M θ ∈ R n n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp s ( σ ) i =1 We make one common assumption: Φ ′′ ≤ L . The variance of X i is s ( σ ) · Φ ′′ ( � m i , θ � ) In the Gaussian case, Φ ′′ ( t ) = 1 e t In the Bernoulli case, Φ ′′ ( t ) = (1+ e t ) 2 ≤ 1 In the Poisson case, Φ ′′ ( t ) = e t In the Exponential case, Φ ′′ ( t ) = 1 t 2 Corresponds to structural assumptions on M and Θ K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 9 / 20

Generalized Linear Models Theorem Given observations X ∈ R n generated from the GLM with fixed M ∈ R n × d , n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp , s ( σ ) i =1 2 ≤ 1 } and Φ ′′ ≤ L, with Θ := B 2 d (1) := { θ : θ ∈ R d , � θ � 2 � 1 , s ( σ ) � E � θ − ˆ θ � 2 Tr(( M ⊤ M ) − 1 ) inf sup 2 � min L ˆ θ θ ∈ Θ K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 10 / 20

Generalized Linear Models Theorem Given observations X ∈ R n generated from the GLM with fixed M ∈ R n × d , n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp , s ( σ ) i =1 2 ≤ 1 } and Φ ′′ ≤ L, with Θ := B 2 d (1) := { θ : θ ∈ R d , � θ � 2 � 1 , s ( σ ) � E � θ − ˆ θ � 2 Tr(( M ⊤ M ) − 1 ) inf sup 2 � min L ˆ θ θ ∈ Θ When M ⊤ M has a zero eigenvalue, adopt Tr(( M ⊤ M ) − 1 ) := + ∞ K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 10 / 20

Generalized Linear Models Theorem Given observations X ∈ R n generated from the GLM with fixed M ∈ R n × d , n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp , s ( σ ) i =1 2 ≤ 1 } and Φ ′′ ≤ L, with Θ := B 2 d (1) := { θ : θ ∈ R d , � θ � 2 � 1 , s ( σ ) � E � θ − ˆ θ � 2 Tr(( M ⊤ M ) − 1 ) inf sup 2 � min L ˆ θ θ ∈ Θ When M ⊤ M has a zero eigenvalue, adopt Tr(( M ⊤ M ) − 1 ) := + ∞ Gaussian linear model with X = LM θ + Z matches with equality and is extremal in this family of GLMs K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 10 / 20

Upper Bound on Mutual Information Consider X ∼ f ( · ; θ ). The Fisher information I X ( θ ) is defined as � |∇ θ f ( X ; θ ) | 2 � I X ( θ ) := E X f 2 ( X ; θ ) K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 12 / 20

Upper Bound on Mutual Information Consider X ∼ f ( · ; θ ). The Fisher information I X ( θ ) is defined as � |∇ θ f ( X ; θ ) | 2 � I X ( θ ) := E X f 2 ( X ; θ ) � Regularity assumption: X ∇ θ f ( x ; θ ) d λ ( x ) = 0 for almost every θ and θ → f ( x ; θ ) is (weakly) differentiable for λ -a.e. x . Theorem (Aras, Lee, Pananjady, Courtade, 2019) Let θ ∼ π , where π is log-concave on R d , and let X ∼ f ( · ; θ ). If the regularity condition is satisfied, � Tr( Cov ( θ )) · E θ I X ( θ ) � I ( θ ; X ) ≤ d · φ d 2 � √ x if 0 ≤ x < 1 where φ ( x ) := 1 + 1 2 log x if x ≥ 1 . K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 12 / 20

Linear Models are Most Favorable among Generalized Linear Models - PowerPoint PPT Presentation

Linear Models are Most Favorable among Generalized Linear Models Kuan-Yun Lee and Thomas A. Courtade Electrical Engineering and Computer Sciences University of California, Berkeley ISIT 2020 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

Bias reduction in generalized nonlinear models Ioannis Kosmidis and David Firth Department of

Generalized MPLS Signaling draft-ietf-mpls-generalized-signaling-05.txt

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

GENERALIZED PIVOTS GENERALIZED PIVOTS FOR VARIANCE COMPONENTS FOR VARIANCE COMPONENTS IN MIXED

Generalized Additive Models David L Miller Overview What is a GAM? What is smoothing? How do

Prediction and Representation of Array Performance under Sensor Failure Erdal MEHMETCIK, Prof.

Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of: Dalgaards ISwR

Geometry of statistical submanifolds FURUHATA Hitoshi (Hokkaido University) PADGE2012 Contents

Multipartite entanglement certification in quantum many-body systems using quench dynamics

Fidelity susceptibility in Gaussian Random Ensembles Marek Ku s* Piotr Sierant** Artur

LECTURE SET 6 PROBABILISTIC BEHAVIOUR RECOGNITION ECVision Summer School: 6 - Probabilistic

Fisher vector image representation Jakob Verbeek January 13, 2012 Course website:

EVALUATION (1-10) IHCC 2019 Set a circle around your choice 1= no relevance/bad presentation at