Linear Models are Most Favorable among Generalized Linear Models Kuan-Yun Lee and Thomas A. Courtade Electrical Engineering and Computer Sciences University of California, Berkeley ISIT 2020 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 1 / 20
Overview Introduction and Main Results 1 Keypoints of Proof 2 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 2 / 20
Overview Introduction and Main Results 1 Keypoints of Proof 2 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 3 / 20
Introduction Given X := ( X 1 , . . . , X n ) ∼ f ( · ; θ ) Linear regression: X = M θ + Z Phase retrieval: X i = � m i , θ � 2 + Z i Group testing: X i = δ ( � m i , θ � ) M ⊤ � � Matrix retrieval: X i = Tr i θ when θ is a matrix . . . Many other settings with sparsity, structural assumptions on M , etc. K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 4 / 20
Introduction Given X := ( X 1 , . . . , X n ) ∼ f ( · ; θ ) Linear regression: X = M θ + Z Phase retrieval: X i = � m i , θ � 2 + Z i Group testing: X i = δ ( � m i , θ � ) M ⊤ � � Matrix retrieval: X i = Tr i θ when θ is a matrix . . . Many other settings with sparsity, structural assumptions on M , etc. Key Question How well can we estimate θ from observations X ∼ f ( · ; θ )? K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 4 / 20
✶ Introduction Consider the classical linear model X = M θ + Z under constraint θ ∈ Θ Fundamental Question θ sup θ ∈ Θ E L ( θ, ˆ Given a loss function L ( · , · ), what is inf ˆ θ )? K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 5 / 20
Introduction Consider the classical linear model X = M θ + Z under constraint θ ∈ Θ Fundamental Question θ sup θ ∈ Θ E L ( θ, ˆ Given a loss function L ( · , · ), what is inf ˆ θ )? Loss functions L ( θ, ˆ θ ): Constraints on Θ: � θ − ˆ θ � 2 2 (estimation error) Θ is L p ball � M θ − M ˆ θ � 2 2 (prediction error) Θ is a matrix space with rank constraints ✶ (supp( θ ) = supp(ˆ θ )) K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 5 / 20
Introduction In this talk, we will focus on estimation error L ( θ, ˆ θ ) := � θ − ˆ θ � 2 2 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 6 / 20
Introduction In this talk, we will focus on estimation error L ( θ, ˆ θ ) := � θ − ˆ θ � 2 2 Consider X = M θ + Z ∈ R n with fixed design matrix M ∈ R n × d , Θ = R d , Z ∼ N (0 , σ 2 · I n ). Suppose M has full column rank, then, ˆ θ MLE := ( M ⊤ M ) − 1 M ⊤ X θ sup θ ∈ R d E � θ − ˆ θ � 2 achieves the minimax error inf ˆ 2 , and 2 = σ 2 · Tr(( M ⊤ M ) − 1 ) . E � θ − ˆ θ MLE � 2 2 = E � ( M ⊤ M ) − 1 M ⊤ Z � 2 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 6 / 20
Introduction In this talk, we will focus on estimation error L ( θ, ˆ θ ) := � θ − ˆ θ � 2 2 Consider X = M θ + Z ∈ R n with fixed design matrix M ∈ R n × d , Θ = R d , Z ∼ N (0 , σ 2 · I n ). Suppose M has full column rank, then, ˆ θ MLE := ( M ⊤ M ) − 1 M ⊤ X θ sup θ ∈ R d E � θ − ˆ θ � 2 achieves the minimax error inf ˆ 2 , and 2 = σ 2 · Tr(( M ⊤ M ) − 1 ) . E � θ − ˆ θ MLE � 2 2 = E � ( M ⊤ M ) − 1 M ⊤ Z � 2 Follow up question Can we generalize this? The Gaussian distribution falls into the exponential family The linear model falls into the family of generalized linear models K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 6 / 20
Exponential Family Density of X ∈ R given natural parameter η ∈ R � η x − Φ( η ) � f ( x ; η ) = h ( x ) exp s ( σ ) h : X ⊆ R → [0 , ∞ ) (the base measure ) Φ : R → R (the cumulant function ) s ( σ ) > 0: scale parameter Examples: � � 1 : h ( x ) = 1, Φ( t ) = log(1 + e t ) and s ( σ ) = 1 Bernoulli 1+ e − η 2 π e − x 2 / 2 , Φ( t ) = t 2 / 2 and s ( σ ) = 1 1 Gaussian( η, 1): h ( x ) = √ Exponential( η ): h ( x ) = 1, Φ( t ) = − log t and s ( σ ) = 1 Poisson( e η ): h ( x ) = 1 / x !, Φ( t ) = e t and s ( σ ) = 1 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 7 / 20
Generalized Linear Models Density of X ∈ R n given parameter M θ ∈ R n n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp s ( σ ) i =1 h : X ⊆ R → [0 , ∞ ) (the base measure ) Φ : R → R (the cumulant function ) s ( σ ) > 0: scale parameter Examples: � � 1 : h ( x ) = 1, Φ( t ) = log(1 + e t ) and s ( σ ) = 1 Bernoulli 1+ e −� mi ,θ � 2 π e − x 2 / 2 , Φ( t ) = t 2 / 2 and s ( σ ) = 1 1 Gaussian( � m i , θ � , 1): h ( x ) = √ Exponential( � m i , θ � ): h ( x ) = 1, Φ( t ) = − log t and s ( σ ) = 1 Poisson( e � m i ,θ � ): h ( x ) = 1 / x !, Φ( t ) = e t and s ( σ ) = 1 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 8 / 20
Generalized Linear Models Density of X ∈ R n given parameter M θ ∈ R n n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp s ( σ ) i =1 We make one common assumption: Φ ′′ ≤ L . K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 9 / 20
Generalized Linear Models Density of X ∈ R n given parameter M θ ∈ R n n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp s ( σ ) i =1 We make one common assumption: Φ ′′ ≤ L . The variance of X i is s ( σ ) · Φ ′′ ( � m i , θ � ) K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 9 / 20
Generalized Linear Models Density of X ∈ R n given parameter M θ ∈ R n n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp s ( σ ) i =1 We make one common assumption: Φ ′′ ≤ L . The variance of X i is s ( σ ) · Φ ′′ ( � m i , θ � ) In the Gaussian case, Φ ′′ ( t ) = 1 e t In the Bernoulli case, Φ ′′ ( t ) = (1+ e t ) 2 ≤ 1 In the Poisson case, Φ ′′ ( t ) = e t In the Exponential case, Φ ′′ ( t ) = 1 t 2 Corresponds to structural assumptions on M and Θ K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 9 / 20
Generalized Linear Models Theorem Given observations X ∈ R n generated from the GLM with fixed M ∈ R n × d , n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp , s ( σ ) i =1 2 ≤ 1 } and Φ ′′ ≤ L, with Θ := B 2 d (1) := { θ : θ ∈ R d , � θ � 2 � 1 , s ( σ ) � E � θ − ˆ θ � 2 Tr(( M ⊤ M ) − 1 ) inf sup 2 � min L ˆ θ θ ∈ Θ K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 10 / 20
Generalized Linear Models Theorem Given observations X ∈ R n generated from the GLM with fixed M ∈ R n × d , n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp , s ( σ ) i =1 2 ≤ 1 } and Φ ′′ ≤ L, with Θ := B 2 d (1) := { θ : θ ∈ R d , � θ � 2 � 1 , s ( σ ) � E � θ − ˆ θ � 2 Tr(( M ⊤ M ) − 1 ) inf sup 2 � min L ˆ θ θ ∈ Θ When M ⊤ M has a zero eigenvalue, adopt Tr(( M ⊤ M ) − 1 ) := + ∞ K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 10 / 20
Generalized Linear Models Theorem Given observations X ∈ R n generated from the GLM with fixed M ∈ R n × d , n � � m i , θ � x i − Φ( � m i , θ � ) � � f ( x ; M , θ ) = h ( x i ) exp , s ( σ ) i =1 2 ≤ 1 } and Φ ′′ ≤ L, with Θ := B 2 d (1) := { θ : θ ∈ R d , � θ � 2 � 1 , s ( σ ) � E � θ − ˆ θ � 2 Tr(( M ⊤ M ) − 1 ) inf sup 2 � min L ˆ θ θ ∈ Θ When M ⊤ M has a zero eigenvalue, adopt Tr(( M ⊤ M ) − 1 ) := + ∞ Gaussian linear model with X = LM θ + Z matches with equality and is extremal in this family of GLMs K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 10 / 20
Overview Introduction and Main Results 1 Keypoints of Proof 2 K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 11 / 20
Upper Bound on Mutual Information Consider X ∼ f ( · ; θ ). The Fisher information I X ( θ ) is defined as � |∇ θ f ( X ; θ ) | 2 � I X ( θ ) := E X f 2 ( X ; θ ) K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 12 / 20
Upper Bound on Mutual Information Consider X ∼ f ( · ; θ ). The Fisher information I X ( θ ) is defined as � |∇ θ f ( X ; θ ) | 2 � I X ( θ ) := E X f 2 ( X ; θ ) � Regularity assumption: X ∇ θ f ( x ; θ ) d λ ( x ) = 0 for almost every θ and θ → f ( x ; θ ) is (weakly) differentiable for λ -a.e. x . Theorem (Aras, Lee, Pananjady, Courtade, 2019) Let θ ∼ π , where π is log-concave on R d , and let X ∼ f ( · ; θ ). If the regularity condition is satisfied, � Tr( Cov ( θ )) · E θ I X ( θ ) � I ( θ ; X ) ≤ d · φ d 2 � √ x if 0 ≤ x < 1 where φ ( x ) := 1 + 1 2 log x if x ≥ 1 . K.-Y. Lee and T. A. Courtade (Berkeley) New Minimax Bound for the GLM ISIT 2020 12 / 20
Recommend
More recommend