On Nonparametric Estimation of the Fisher Information Wei Cao 1 , Alex Dytso 2 , Michael Fauß 2 , H. Vincent Poor 2 , and Gang Feng 1 1 University of Electronic Science and Technology of China 2 Princeton University IEEE International Symposium on Information Theory (ISIT) June 2020
Presentation Outline 1 Introduction 2 The Bhatuacharya Estimator 3 A Clipped Estimator 4 The Gaussian Noise Case 5 Conclusion 1 / 19
Introduction allocation at the transmituer Problem: (1) 2 / 19 Fisher information for location of a pdf f : ( f ′ ( t )) 2 � I ( f ) = d t, f ( t ) t ∈ R where f ′ is the derivative of f . ▶ An important quantity providing fundamental performance bounds ▶ In practice: no closed-form solutions, distributions rarely known exactly ▶ In Gaussian noise: Fisher information of the received signal allows for optimal power Estimating I ( f ) based on n random samples Y 1 , . . . , Y n independently drawn from f .
Introduction Available estimators Asymptotic results with inexplicit constants Main contributions estimator for the MMSE 1 PK Bhatuacharya. “Estimation of a probability density function and its derivatives”. In: Sankhyā: The Indian Journal of Statistics, Series A 29.4 (1967), pp. 373–382. 2 David L Donoho. “One-sided inference about functionals of a density”. In: The Annals of Statistics 16.4 (1988), pp. 1390–1420. 3 / 19 ▶ Bhatuacharya estimator 1 : kernel based, straightforward and easy to implement ▶ Donoho estimator 2 : lower bound FI over neighborhood of empirical CDF ▶ Explicit and tighter non-asymptotic results for the Bhatuacharya estimator ▶ A new estimator with betuer bounds on convergence rate ▶ Evaluation for the case of a r.v. contaminated by Gaussian noise, and a consistent
Presentation Outline 1 Introduction 2 The Bhatuacharya Estimator 3 A Clipped Estimator 4 The Gaussian Noise Case 5 Conclusion 4 / 19
The Bhattacharya Estimator Kernel density estimator (3) The Bhatuacharya estimator continuously difgerential pdf. (2) 5 / 19 n � t − Y i � f n ( t ) = 1 1 � aK , n a i =1 where a > 0 is the bandwidth parameter. The kernel K ( · ) is assumed to be a ( f ′ n ( t )) 2 � I n ( f n ) = d t, f n ( t ) | t |≤ k n for some k n ≥ 0 .
The Bhattacharya Estimator Estimating density and its derivatives pp. 1269–1283. 4 Pascal Massart. “The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality”. In: The Annals of Probability (1990), Statistics 40.4 (1969), pp. 1187–1195. 3 Eugene F Schuster. “Estimation of a probability density function and its derivatives”. In: The Annals of Mathematical (4) sup 6 / 19 Theorem 1 � � � � � � � � K ( r +1) ( t ) f ( r ) − f ( r ) ( t ) Let r ∈ { 0 , 1 } , v r = � d t , and δ r,a = sup t ∈ R n ( t ) � . Then, for any ϵ > δ r,a and � � � E � � any n ≥ 1 the following bound holds: − 2 n a 2 r +2( ϵ − δr,a )2 � � � � � f ( r ) n ( t ) − f ( r ) ( t ) v 2 � > ϵ ≤ 2 e . P � � r t ∈ R ▶ Based on the proof by Schuster 3 ▶ Using the best possible constant for the DKW inequality 4
The Bhattacharya Estimator Analysis of the Bhatuacharya Estimator a r.v. contaminated by Gaussian noise), preventing the estimators from being practical (5) 7 / 19 Theorem 2 1 Assume there exists a function ϕ such that sup | t |≤ x f ( t ) ≤ ϕ ( x ) , for all x . Then, provided that � � � f ( i ) n ( t ) − f ( i ) ( t ) � ≤ ϵ i , i ∈ { 0 , 1 } , and ϵ 0 ϕ ( k n ) < 1 , the following bound holds: � � sup | t |≤ k n | I ( f ) − I n ( f n ) | ≤ 4 ϵ 1 k n ρ max ( k n ) + 2 ϵ 2 1 k n ϕ ( k n ) + ϵ 0 ϕ ( k n ) I ( f ) + c ( k n ) , 1 − ϵ 0 ϕ ( k n ) � � ( f ′ ( t )) 2 � f ′ ( t ) � where ρ max ( k n ) = sup | t |≤ k n � and c ( k n ) = d t . � � f ( t ) | t |≥ k n f ( t ) ▶ A non-asymptotic refinement of result in [1, Theorem 3], which contains ϵ 0 ϕ 4 ( k n ) ▶ ϕ ( k n ) increases with k n (usually very fast, e.g., ϕ ( k n ) increases with k n exponentially for
Presentation Outline 1 Introduction 2 The Bhatuacharya Estimator 3 A Clipped Estimator 4 The Gaussian Noise Case 5 Conclusion 8 / 19
A Clipped Estimator (7) (8) The clipped estimator: 9 / 19 (6) Assume there exists a function ρ such that | ρ ( t ) | ≤ | ρ ( t ) | , for all t ∈ R and let ρ n ( t ) = f ′ n ( t ) f n ( t ) . � k n min {| ρ n ( t ) | , | ρ ( t ) |} | f ′ n ( f n ) = n ( t ) | d t. I c − k n � � � f ′ ( t ) ▶ We can set ρ ( k n ) = ρ max ( k n ) , where ρ max ( k n ) = sup | t |≤ k n � � f ( t ) �
A Clipped Estimator (9) (11) (10) Analysis of the clipped estimator where 10 / 19 Theorem 3 � � � f ( i ) n ( t ) − f ( i ) ( t ) � ≤ ϵ i , i ∈ { 0 , 1 } , it holds that � � Under the assumptions sup | t |≤ k n n ( f n ) | ≤ 4 ϵ 1 Φ 1 max ( k n ) + 2 ϵ 0 Φ 2 | I ( f ) − I c max ( k n ) + c ( k n ) , ( f ′ ( t )) 2 � c ( k n ) = d t f ( t ) | t |≥ k n � x Φ m | ρ m ( t ) | d t. max ( x ) = − x The proof is based on two auxiliary estimators: to under- and overestimate I c n
Presentation Outline 1 Introduction 2 The Bhatuacharya Estimator 3 A Clipped Estimator 4 The Gaussian Noise Case 5 Conclusion 11 / 19
Estimation of the FI of a R.V. Contaminated by Gaussian Noise (12) where: arbitrary random variable Gaussian kernel Lemma 1 evaluates the quantities appearing in Th.2 and Th.3 12 / 19 Let f Y denote the pdf of a random variable Y = √ snr X + Z, ▶ Only a very mild assumption that X has a finite second moment but otherwise is an ▶ Z is a standard Gaussian random variable ▶ X and Z are independent Estimating the Fisher information of f Y .
Estimation of the FI of a R.V. Contaminated by Gaussian Noise (13) (14) Convergence of the Bhatuacharya estimator where 13 / 19 Theorem 4 If a = n − w , where w ∈ 0 , 1 � � � , and k n = u log ( n ) , where u ∈ (0 , w ) , then 6 P [ | I n ( f n ) − I ( f Y ) | ≥ ε n ] ≤ 2 e − c 1 n 1 − 4 w + 2 e − c 2 n 1 − 6 w , � u log ( n ) + 2 c 5 n u − w � n − w � � u log ( n ) c 3 + 12 c 4 c 5 ε n ≤ + + n w − u − 1 . 1 − n u − w � u log ( n ) ▶ I n ( f n ) converges to I ( f Y ) with probability 1. ▶ u and w : a trade-ofg between the convergence rate and the precision.
Estimation of the FI of a R.V. Contaminated by Gaussian Noise Convergence of the clipped estimator (16) where (15) 14 / 19 Theorem 5 If a = n − w , where w ∈ 0 , 1 � � , and k n = n u , where u ∈ (0 , w 3 ) , then 6 n ( f n ) − I ( f Y ) | ≥ ε n ] ≤ 2 e − c 1 n 1 − 4 w + 2 e − c 2 n 1 − 6 w , P [ | I c c 3 + 2 n u + n 2 u � ε n ≤ 12 n u − w � + c 4 n − u . ▶ Improved precision: decaying polynomially in n instead of logarithmically
Estimation of the MMSE (18) (20) Proposition 1 (19) Brown’s identity An estimator for the MMSE snr 15 / 19 where (17) I ( f Y ) = 1 − snr mmse ( X | Y ) , ( X − E [ X | Y ]) 2 � � mmse ( X | Y ) = E . mmse n ( X | Y ) = 1 − I c n ( f n ) . If a = n − w , where w ∈ � 0 , 1 � , and k n = n u , where u ∈ (0 , w 3 ) , then 6 P [ | mmse n ( X | Y ) − mmse ( X | Y ) | ≥ snr ε n ] ≤ 2 e − c 1 n 1 − 4 w + 2 e − c 2 n 1 − 6 w .
Examples snr (b) snr (a) 16 / 19 1 1 I ( f Y ) ˆ I, a 0 = a 1 = 0 . 6 0 . 8 ˆ I, a 0 = a 1 = 0 . 3 ˆ I c , a 0 = a 1 = 0 . 6 0 . 8 ˆ I c , a 0 = a 1 = 0 . 3 0 . 6 I ( f Y ) I ( f Y ) 0 . 4 I ( f Y ) 0 . 6 ˆ I, a 0 = a 1 = 0 . 3 ˆ I, a 0 = 0 . 3 , a 1 = 0 . 15 0 . 2 ˆ I c , a 0 = a 1 = 0 . 3 ˆ I c , a 0 = 0 . 3 , a 1 = 0 . 15 0 0 . 4 0 2 4 6 8 10 0 2 4 6 8 10 Figure 1: Fisher information and its estimates when n = 10 4 and k n = 10 with: a) Gaussian input; and b) binary input.
Examples Bhatuacharya estimator (b) Clipped estimator Bhatuacharya estimator (a) Clipped estimator 17 / 19 21 30 20 19 25 log 10 ( n ) 18 log 10 ( n ) 20 17 16 15 15 14 0 . 2 0 . 4 0 . 6 0 . 8 0 . 2 0 . 4 0 . 6 0 . 8 ε n P err Figure 2: Sample complexity with Gaussian input versus: a) fixed P err = 0 . 2 and varying ε n ; and b) fixed ε n = 0 . 5 and varying P err .
Presentation Outline 1 Introduction 2 The Bhatuacharya Estimator 3 A Clipped Estimator 4 The Gaussian Noise Case 5 Conclusion 18 / 19
Conclusion Estimation of the Fisher information of a random variable Specialization of the results of both estimators A consistent estimator for the MMSE Interesting future directions 5 Wei Cao, Alex Dytso, Michael Fauß, Gang Feng, and H. Vincent Poor. “Robust Power Allocation for Parallel Gaussian Channels with Approximately Gaussian Input Distributions”. In: IEEE Transactions on Wireless Communications (Early Access) (2020). 19 / 19 ▶ Bhatuacharya estimator: new sharper convergence results ▶ A clipped estimator: betuer bounds on convergence rates ▶ The case of a Gaussian noise contaminated random variable: ▶ To study the Gaussian noise case with further assumptions ▶ Applications in power allocation problems 5
A full version can be found at: W. Cao, A. Dytso, M. Fauß, H. V. Poor, and G. Feng, “Nonparametric estimation of the Fisher information and its applications.” Available: https://arxiv.org/pdf/2005.03622.pdf Email: clarissa.cao@hotmail.com, {adytso, mfauss, poor}@princeton.edu, fenggang@uestc.edu.cn Thank You
Recommend
More recommend