Probabilities are . . . Maximum Entropy . . . MaxEnt Approach: . . . Extending Maximum Main Idea and Its . . . Analysis of the Problem Continuous Case: . . . Entropy Techniques to How We Go From . . . Continuous Analog of S 2 Entropy Constraints Meaning of Entropy S : . . . Conclusions and . . . Gang Xiang 1 and Vladik Kreinovich 2 Title Page ◭◭ ◮◮ 1 Philips Healthcare El Paso, Texas 79902, USA ◭ ◮ gxiang@sigmaxi.net Page 1 of 14 2 Department of Computer Science Go Back University of Texas at El Paso 500 W. University Full Screen El Paso, TX 79968, USA contact email vladik@utep.edu Close Quit
Probabilities are . . . Maximum Entropy . . . 1. Probabilities are Usually Imprecise: A Reminder MaxEnt Approach: . . . • Often, we have only partial (imprecise) information Main Idea and Its . . . about the probabilities: Analysis of the Problem Continuous Case: . . . – Sometimes, we have crisp (interval) bounds on prob- How We Go From . . . abilities (and/or other statistical characteristics). Continuous Analog of S 2 – Sometimes, we have fuzzy bounds, i.e., different in- Meaning of Entropy S : . . . terval bounds with different degrees of certainty. Conclusions and . . . • In this case, for each statistical characteristic, it is de- Title Page sirable to find: ◭◭ ◮◮ – the worst possible value of this characteristic, ◭ ◮ – the best possible value of this characteristic, and Page 2 of 14 – the “typical” (“most probable”) value of this char- Go Back acteristic. Full Screen Close Quit
Probabilities are . . . Maximum Entropy . . . 2. Maximum Entropy (MaxEnt) Approach MaxEnt Approach: . . . • By the “typical” value of a characteristic, we mean its Main Idea and Its . . . value for a “typical” distribution. Analysis of the Problem Continuous Case: . . . • Usually, as such a “typical” distribution, we select the How We Go From . . . one with the largest value of the entropy S . Continuous Analog of S 2 • Meaning: S = average # of “yes”-“no” questions (bits) Meaning of Entropy S : . . . that we need to ask to determine the exact value x i . Conclusions and . . . • When we have n different values x 1 , . . . , x n with prob- Title Page abilities p 1 , . . . , p n , the entropy S ( p ) is defined as ◭◭ ◮◮ n def � ◭ ◮ S = − p i · log 2 ( p i ) . i =1 Page 3 of 14 def Go Back � • For pdf ρ ( x ), S = − ρ ( x ) · log 2 ( ρ ( x )) dx. Full Screen • S is related to the average number of questions needed to determine x with a given accuracy ε > 0. Close Quit
Probabilities are . . . Maximum Entropy . . . 3. MaxEnt Approach: Successes and Limitations MaxEnt Approach: . . . • Successes: when we know values of ranges and mo- Main Idea and Its . . . ments. Analysis of the Problem Continuous Case: . . . • Example 1: if we only know that x ∈ [ x, x ], we get a How We Go From . . . uniform distribution on this interval. Continuous Analog of S 2 • Example 2: if we only know the first 2 moments, we Meaning of Entropy S : . . . get a Gaussian distribution. Conclusions and . . . • Problem: sometimes, we also know the value S 0 of the Title Page entropy itself. ◭◭ ◮◮ • Why this is a problem: ◭ ◮ – all distributions satisfying this constraint S = S 0 Page 4 of 14 have the same entropy; Go Back – hence the MaxEnt approach cannot select a one. Full Screen • What we do: we show how to handle this constraint. Close Quit
Probabilities are . . . Maximum Entropy . . . 4. Main Idea and Its Consequences MaxEnt Approach: . . . • Fact: the actual probabilities p 1 , . . . , p n are only ap- Main Idea and Its . . . proximately equal to frequencies: p i ≈ f i . Analysis of the Problem Continuous Case: . . . • Idea: instead of selecting “typical” probabilities, let us select “typical” frequencies. How We Go From . . . Continuous Analog of S 2 • Hence: since p i ≈ f i , we have S ( p i ) ≈ S ( f i ) = S 0 . Meaning of Entropy S : . . . • Idea: select f i and consistent p i for which the entropy Conclusions and . . . S ( p ) is the largest possible. Title Page def • Asymptotically: each δ i = p i − f i is normal, with mean ◭◭ ◮◮ i = f i · (1 − f i ) 0 and σ 2 , where N denotes sample size. ◭ ◮ N Page 5 of 14 n δ 2 δ 2 n � • Thus: by χ 2 , i i � = f i · (1 − f i ) /N ≈ n. Go Back σ 2 i =1 i i =1 Full Screen • Resulting problem: find f i and p i that maximize S ( p ) under the above condition. Close Quit
Probabilities are . . . Maximum Entropy . . . 5. Analysis of the Problem MaxEnt Approach: . . . δ 2 n f i · (1 − f i ) /N = n Main Idea and Its . . . i • Problem: under � N , maximize S ( p ) = i =1 Analysis of the Problem n S ( f 1 + δ 1 , . . . , f n + δ n ) = − � ( f i + δ i ) · log 2 ( f i + δ i ) . Continuous Case: . . . i =1 How We Go From . . . • For large N : δ i are small, so Continuous Analog of S 2 n Meaning of Entropy S : . . . ∂S � S ( f 1 + δ 1 , . . . , f n + δ n ) = S ( f 1 , . . . , f n ) + · δ i . Conclusions and . . . ∂f i i =1 Title Page • Here, ∂S ◭◭ ◮◮ = − log 2 ( f i ) − log 2 ( e ) , so Lagrange multiplier ∂f i ◭ ◮ method leads to maximizing Page 6 of 14 n n δ 2 � � i S 0 − (log 2 ( f i ) + log 2 ( e )) · δ i + λ · f i · (1 − f i ) . Go Back i =1 i =1 Full Screen Close Quit
Probabilities are . . . Maximum Entropy . . . 6. Analysis of the Problem (cont-d) MaxEnt Approach: . . . • Reminder: we maximize Main Idea and Its . . . n n Analysis of the Problem δ 2 � � i S 0 − (log 2 ( f i ) + log 2 ( e )) · δ i + λ · f i · (1 − f i ) . Continuous Case: . . . i =1 i =1 How We Go From . . . • Analysis: equating derivatives to 0, we get δ i in terms Continuous Analog of S 2 of λ , then λ in terms of f i , so the maximum is Meaning of Entropy S : . . . Conclusions and . . . n def (log 2 ( f i ) + log 2 ( e )) 2 · f i · (1 − f i ) . � S 2 = Title Page i =1 ◭◭ ◮◮ • Result: ◭ ◮ – if we have several distributions with the same values Page 7 of 14 of the entropy S , Go Back – we should select the one with the largest value of Full Screen the new characteristic S 2 . Close Quit
Probabilities are . . . Maximum Entropy . . . 7. Continuous Case: Main Idea MaxEnt Approach: . . . • Situation: we have a continuous distribution, with pdf Main Idea and Its . . . Analysis of the Problem p ([ x, x + ∆ x ]) def ρ ( x ) = lim . Continuous Case: . . . ∆ x ∆ x → 0 How We Go From . . . • Idea: Continuous Analog of S 2 – we divide the interval of possible values of x into Meaning of Entropy S : . . . intervals [ x i , x i + ∆ x ] of small width ∆ x ; Conclusions and . . . Title Page – we consider the discrete distribution with these in- tervals as possible values. ◭◭ ◮◮ • Fact: when ∆ x is small, by the definition of the pdf, ◭ ◮ we have p i ≈ ρ ( x i ) · ∆ x. Page 8 of 14 • Limit: then, we take the limit ∆ x → 0. Go Back • Example: this is how we go from the discrete entropy Full Screen S ( p 1 , . . . , p n ) to the entropy S ( ρ ) of the continuous one. Close Quit
Probabilities are . . . Maximum Entropy . . . 8. How We Go From S ( p 1 , . . . , p n ) to S ( ρ ) : Reminder MaxEnt Approach: . . . n def Main Idea and Its . . . • Reminder: S = − � p i · log 2 ( p i ) . i =1 Analysis of the Problem • Idea: take p i = ρ ( x i ) · ∆ x and take a limit ∆ x → 0: Continuous Case: . . . n How We Go From . . . � S = − ρ ( x i ) · ∆ x · log 2 ( ρ ( x i ) · ∆ x ) = Continuous Analog of S 2 i =1 Meaning of Entropy S : . . . n n Conclusions and . . . � � ρ ( x i ) · ∆ x · log 2 ( ρ ( x i )) − ρ ( x i ) · ∆ x · log 2 (∆ x ) . − Title Page i =1 i =1 ◭◭ ◮◮ � • So, S ∼ − ρ ( x ) · log 2 ( ρ ( x )) dx − log 2 (∆ x ) . ◭ ◮ • Fact: the second term in this sum does not depend on the probability distribution at all. Page 9 of 14 • Corollary: maximizing the entropy S is equivalent to Go Back maximizing the integral in the above expression. Full Screen � • Obervation: the integral − ρ ( x ) · log 2 ( ρ ( x )) dx is ex- Close actly the entropy of the continuous distribution. Quit
Probabilities are . . . Maximum Entropy . . . 9. Continuous Analog of S 2 MaxEnt Approach: . . . n def (log 2 ( f i ) + log 2 ( e )) 2 · f i · (1 − f i ) . Main Idea and Its . . . • Reminder: S 2 = � i =1 Analysis of the Problem • Idea: take f i = ρ ( x i ) · ∆ x and take a limit ∆ x → 0. Continuous Case: . . . (log 2 ( ρ ( x ))) 2 · ρ ( x ) dx − How We Go From . . . • Asymptotically: S 2 = � Continuous Analog of S 2 2 · (log 2 (∆ x ) + log 2 ( e )) · S + (log 2 (∆ x ) + log 2 ( e )) 2 . Meaning of Entropy S : . . . Conclusions and . . . • The 2nd and 3rd terms depend only on the step size Title Page ∆ x and on the entropy S – but not explicitly on ρ ( x ). ◭◭ ◮◮ • Reminder: we assume that S is known. ◭ ◮ • Corollary: maximizing the value S 2 is equivalent to maximizing the integral in the above expression. Page 10 of 14 • Conclusion: select the distribution with the largest Go Back � Full Screen def (log 2 ( ρ ( x ))) 2 · ρ ( x ) dx. S 2 ( ρ ) = Close Quit
Recommend
More recommend