Entropy EsNmaNon for Non-IID Sources Kerry McKay kerry.mckay@nist.gov - PowerPoint PPT Presentation

Entropy EsNmaNon for Non-IID Sources Kerry McKay kerry.mckay@nist.gov Random Bit GeneraNon Workshop 2016

2012 Recap • 2012 dra? of SP 800-90B included non-IID esNmators based on entropic staNsNcs TheoreNcal bounds on IID data – • The methods (tests) were Collision – ParNal collecNon (removed) – Compression (altered s.d. calculaNon) – Markov – Frequency (removed, use Most Common Value esNmate instead) – • For all, changed from 95% to 99% confidence interval in 2016 5/2/16 2

Why Add More? • There were gaps in 2012 methods • We wanted to add esNmators that were designed for IID and non-IID data that wouldn’t unfairly lower entropy esNmates – ParNal collecNon was o?en cruel to non-binary sources • Two types added in 2016 dra? – Predictors – Tuple-based esNmates 5/2/16 3

Predictability and Entropy What is the next output ? ? • Shannon first invesNgated the relaNonship between entropy and predictability in 1951 • Used the ability of humans to predict the next character in the text to esNmate the entropy per character 5/2/16 4

Predictors • Predictors are a framework • APempt to mimic adversary that has access to outputs only • Predictor = model + predicNon funcNon • Given past observaNons, try to guess next output • If guess is correct, record 1; else, record 0 • Include last observaNon in the model 5/2/16 5

Benefits • No need to violate assumpNons about source’s underlying probability distribuNon • Can account for changes over Nme • MulNple ways of esNmaNng entropy 5/2/16 6

EsNmaNng Entropy • A?er N predicNons, have a sequence of 1’s and 0’s • Interpret sequence as result of N independent Bernoulli trials • We use two noNons of predictability to derive entropy esNmate from sequence Global predictability – Local predictability – 5/2/16 7

Global Predictability • Considers how well a predictor is able to guess next output on average • P global = (# correct predicNons)/ N • P’ global is upper bound of 99% confidence interval on P global • PrePy straighMorward 5/2/16 8

Local Predictability • Considers how well a predictor is able to guess next output based on longest run of correct predicNons • Useful if the entropy source falls into highly predictable state – What if the DRBG were seeded from a predictable stream of outputs? • We want to find probability of success for each trial, P local , that is consistent with our observaNons • Specifically, we want to find P local such that the probability that we observed the longest run of successes in N trials is 0.99 5/2/16 9

Local Predictability (cont.) • Have an asymptoNc approximaNon that tells us the probability that there are no runs of length r in N trials, given P local • We turn this around by performing binary search on P local unNl result is sufficiently close to 0.99 Let r be length of longest run + 1 – Solve for P local – – Where • q is 1- P local • x is root of polynomial that can be approximated by iteraNng a recurrence relaNon Ref: Feller, W.: An IntroducNon to Probability Theory and its ApplicaNons, vol. 1, chap. 13. John Wiley and Sons, Inc. (1950) 5/2/16 10

Predictor Min-Entropy EsNmate • The min-entropy esNmate for a predictor is –log 2 (max( P’ global , P local )) • We expect most min-entropy esNmates to be based on global predictability Local predictability is intended for severe failures – 5/2/16 11

Example • Suppose that 14 of 20 guesses were correct – P global = 0.7 – P’ global = 0.7+2.576*sqrt(0.7*0.3/19) = 0.9708 • Suppose that the longest run of correct guesses is 6 – Binary search finds that P local = 0.3779 0.3779 1.0000 • P’ global > P local 0.9000 0.8000 0.7000 • Min-entropy esNmate is 0.6000 0.5000 –log 2 ( P’ global ) ≈ 0.0428 0.4000 0.3000 0.2000 0.1000 0.0000 0 0.2 0.4 0.6 0.8 1 1.2 5/2/16 12

Ensemble Predictors • Several predictors can be combined into one – E.g., different parameters for model construcNon and/or predicNon funcNon Call each one a subpredictor – • Ensemble predictor keeps track of performances of each subpredictor in a scoreboard • Best performing subpredictor is used for the next predicNon • The final entropy esNmate is based on success of the ensemble predictor, not on the individual performance of the subpredictors 5/2/16 13

90B Predictors • In SP 800-90B strategy (take lowest esNmate), a predictor will only lower the awarded entropy esNmate if it is good at guessing the next output Bad models can’t significantly lower the esNmate – • Without source knowledge, difficult to make best predictor – We can make generic predictors that perform reasonably well 5/2/16 14

90B Predictors • SP 800-90B specifies four generic predictors: – MulN Most Common in Window PredicNon – Lag PredicNon – MulNMMC PredicNon – LZ78Y PredicNon • MulNMCW, Lag, and MulNMMC are ensemble predictors 5/2/16 15

MulN Most Common in Window Predictor • Each subpredictor keeps window of previous w observaNons We use four window sizes w =63, 255, 1023, and 4095 – PredicNon is the most common value in the window – • Performs well in cases where there is a clear most common value, but the value may vary over Nme E.g., due to environmental condiNons such as operaNng temperature – 5/2/16 16

Lag Predictor • Each subpredictor predicts value observed at a fixed lag, d – Example: if d =1, the subpredictor predicts the last observed value • 90B lag predictor contains 128 subpredictors for lags from 1 to 128 • Performs well on sources with strong periodic behavior, if d is related to period 5/2/16 17

MulNMMC Predictor • MulNple Markov Model with CounNng • Each subpredictor constructs a Markov model from observed outputs – Records the observed frequencies of transiNons (rather than probabiliNes) – PredicNon follows most frequently observed transiNon from the previous d outputs • MulNMMC ensemble predictor uses 16 Markov models with order from 1 to 16 • Works well on sources where outputs are dependent on previous 16 or fewer outputs 5/2/16 18

LZ78Y Predictor • Shares concepts with MulNMMC, but applied differently – Both look at previous outputs and build model with counts of next outputs – This is not an ensemble predictor – PredicNon favors longest string with highest count, not length that performed best in the past – Model (dicNonary) construcNon is bounded • Performs well on sources that would be efficiently compressed by LZ78- like compression algorithms 5/2/16 19

Tuple-based EsNmates • Added two tuple-based esNmates that are based on tuples t-tuple esNmate – LRS esNmate – • These tuple esNmates aPempt to capture global properNes of output sequence 5/2/16 20

t-Tuple EsNmate • EsNmate based on frequencies of tuples • t is largest value such that most common t -tuple appears at least 35 Nmes in sequence • For i from 1 to t , calculate proporNon of highest frequency of i- tuple to all i- tuples in sequence • P max for each i is i th root of proporNon • Entropy is calculated from highest P max 5/2/16 21

LRS EsNmate • Longest repeated substring EsNmates collision entropy – – LRS concept also appears in IID tesNng, but does not award entropy esNmate • Find length of smallest repeated substring that occurs < 20 Nmes, u • Find length of longest repeated substring, v • For W from u to v , esNmate collision probability and max probability of output • Use highest max probability to derive min-entropy esNmate 5/2/16 22

Summary • The non-IID path now includes generic predictors and tuple-based esNmates • Predictors mimic aPacker guessing the next output based on previous outputs and simple models • Tuple-based esNmates that capture global properNes • Complement entropic staNsNcs approach 5/2/16 23

Entropy EsNmaNon for Non-IID Sources Kerry McKay kerry.mckay@nist.gov - PowerPoint PPT Presentation

Entropy EsNmaNon for Non-IID Sources Kerry McKay kerry.mckay@nist.gov Random Bit GeneraNon Workshop 2016 2012 Recap 2012 dra? of SP 800-90B included non-IID esNmators based on entropic staNsNcs TheoreNcal bounds on IID data The methods

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

IID ARCHITECTS SCHOOL ESTATE MASTER PLANNING IID Architects | The Poppy Factory | 20 Petersham

INTERNATIONAL INSURER DEPARTMENT (IID) - NEW FILING SYSTEM FOR COMPANY FILERS NEW IID

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Bayesian Networks Alan Ri2er Problem: Non-IID Data Most

Tutorial 2 Monday 8 th August, 2016 Problem 1. Case for non-IID dataset: In the class, we

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

Entropy and The Second Law of Thermodynamics Entropy (S)

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Consideration of participating in the upgrade of the IID S Line Neil Millar Executive Director

PREVENT DEATH BY EMBRACING TOBACCO CESSATION Gloria Sachdev,

Jeff ff Al Allen, n, Ph.D. .D. Ex Execu cutive tive Dir irect ctor, r, Fri riends ds

Uncovering Multiple CP-Nonconserving Mechanisms of ( ) 0 -Decay S. T. Petcov SISSA/INFN,

Little Forest Burial Ground Scenario: Results M. Johansen & J. Twining NMES Project,

FDAs Approach to Regulation of Nanotechnology Products Ritu Nalubola, Ph.D. Senior Policy

Integrated Statistics and Accounts Examples at BEA Nicole Mayerhauser International Workshop on

Pharma Outsourcing: The Numbers, Trends, and Partnership Strategies Jim Miller, President

V ANET IN V EHICLE - TO - GRID Luigi Atzori and Alessio Meloni Dept. of Electrical and Electronic

Entropy EsNmaNon for Non-IID Sources Kerry McKay kerry.mckay@nist.gov - PowerPoint PPT Presentation

Entropy EsNmaNon for Non-IID Sources Kerry McKay kerry.mckay@nist.gov Random Bit GeneraNon Workshop 2016 2012 Recap 2012 dra? of SP 800-90B included non-IID esNmators based on entropic staNsNcs TheoreNcal bounds on IID data The methods

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

IID ARCHITECTS SCHOOL ESTATE MASTER PLANNING IID Architects | The Poppy Factory | 20 Petersham

INTERNATIONAL INSURER DEPARTMENT (IID) - NEW FILING SYSTEM FOR COMPANY FILERS NEW IID

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Bayesian Networks Alan Ri2er Problem: Non-IID Data Most

Tutorial 2 Monday 8 th August, 2016 Problem 1. Case for non-IID dataset: In the class, we

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

Entropy and The Second Law of Thermodynamics Entropy (S)

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Consideration of participating in the upgrade of the IID S Line Neil Millar Executive Director

PREVENT DEATH BY EMBRACING TOBACCO CESSATION Gloria Sachdev,

Jeff ff Al Allen, n, Ph.D. .D. Ex Execu cutive tive Dir irect ctor, r, Fri riends ds

Uncovering Multiple CP-Nonconserving Mechanisms of ( ) 0 -Decay S. T. Petcov SISSA/INFN,

Little Forest Burial Ground Scenario: Results M. Johansen &amp; J. Twining NMES Project,

FDAs Approach to Regulation of Nanotechnology Products Ritu Nalubola, Ph.D. Senior Policy

Integrated Statistics and Accounts Examples at BEA Nicole Mayerhauser International Workshop on

Pharma Outsourcing: The Numbers, Trends, and Partnership Strategies Jim Miller, President

V ANET IN V EHICLE - TO - GRID Luigi Atzori and Alessio Meloni Dept. of Electrical and Electronic

Little Forest Burial Ground Scenario: Results M. Johansen & J. Twining NMES Project,