Random Bit Generation Workshop 2016 National Institute of Standards - PowerPoint PPT Presentation

Meltem Sonmez Turan meltem.turan@nist.gov Random Bit Generation Workshop 2016 National Institute of Standards and Technology

What is the IID Assumption? Critical assumption in statistics, machine learning theory, entropy estimation, etc. In probability theory, a collection of random variables is independent and identically distributed (IID or i.i.d. ), if • each sample has the same probability distribution as every other sample, and • all samples are mutually independent. Examples : dice rolls, coin flips 120 250 100 200 80 150 60 100 40 50 20 0 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 IID - Uniformly distributed. Non-IID behavior. NIST RBG WORKSHOP, May 2016 2

Why is IID testing important for SP 800-90B? SP 800-90B has two tracks for entropy estimation: - IID track: If the noise source is IID, the entropy is estimated using the most common value estimate. - Non-IID track: If the noise source is not IID, the entropy estimation is more complex. We use ten estimators. Determining the track: The track is IID only if all of the conditions are satisfied; 1. The following datasets are tested, and the IID assumption is verified - Sequential dataset - Row and column datasets - Conditioned sequential dataset (if a non-vetted conditioning component is used) . 2. IID claim by the submitter NIST RBG WORKSHOP, May 2016 3

IID Testing Input: The sequence S =( s 1 ,…, s L ) where s i ϵ A = { x 1 ,…, x k } and L ≥ 1,000,000. Output: Decision regarding the IID assumption: The samples are not IID OR There is no evidence that data is not IID . Two types of tests : 1. Permutation testing (shuffling tests): based on test statistics with unknown distributions. 2. Chi-square tests: based on test statistics with approximated distributions. If the hypothesis is rejected by any of the tests, the values in S are assumed to be non-IID. NIST RBG WORKSHOP, May 2016 4

Permutation Testing Input Test statistics T sequence S Test statistics T Test statistics T Test statistics T Shuffled S T 1 35 30 25 Shuffled S T 2 20 15 10 Shuffled S T 3 5 0 … 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Shuffled S T 10,000 NIST RBG WORKSHOP, May 2016 5

Permutation Testing Input: S = ( s 1 ,…, s L ) Output: Decision on the IID assumption Assign the counters C 0 and C 1 to zero. Calculate the test statistic T on S : denote the result as t . For j = 1 to 10,000 • Permute S using the Fisher-Yates shuffle algorithm. • Calculate the test statistic on the permuted data: denote the result as t  . Input: S = ( s 1 , … , s L ) • If ( t ' > t ), increment C 0 . If ( t '= t ), increment C 1 . Output: Shuffled S = ( s 1 , … , s L ) • If (( C 0 + C 1 ≤ 5) or ( C 0 ≥ 9995)), reject the IID assumption; else, assume that the noise source outputs are IID. 1. i = L 2. While ( i ≥ 1) a. Generate a random integer j that is uniformly distributed between 0 and i. b. Swap s j and s i i = i −1 NIST RBG WORKSHOP, May 2016 6

Test statistics for Permutation Testing Eleven test statistics: 1. Excursion 2. Number of directional runs 3. Length of directional runs 4. Number of increases and decreases 5. Number of runs based on the median 6. Length of runs based on median 7. Average collision 8. Maximum collision 9. Periodicity (5 parameters) 10. Covariance (5 parameters) 11. Compression NIST RBG WORKSHOP, May 2016 7

Binary vs. non-binary samples The number of distinct sample values, (size of A ), significantly affects the distribution of the test statistics. Two conversions for binary data: • Conversion I partitions the sequences into 8-bit non-overlapping blocks, and counts the number of ones in each block. S = (1,0,0,0,1,1,1,0,1,1,0,1,1,0,1,1,0,0,1,1) becomes (4, 6, 2). • Conversion II partitions the sequences into 8-bit non-overlapping blocks, and calculates the integer value of each block. S = (1,0,0,0,1,1,1,0, 1,1,0,1,1,0,1,1,0,0,1,1) becomes (142, 219, 48). NIST RBG WORKSHOP, May 2016 8

1. Excursion Test Statistics Based on how far the running sum of Example: sample values deviates from its average Let S = (2, 15, 4, 10, 9). at each point in the dataset. The average = 8. d 1 = |2 – 8| = 6 d 2 = |(2+15) – (2  8)| = 1 Pseudocode: 1. Find ത d 3 = |(2+15+4) – (3  8)| = 3 𝑌 = ( s 1 + s 2 + … + s L ) / L . d 4 = |(2+15+4+10) – (4  8)| = 1 2. For i = 1 to L , find d 5 = |(2+15+4+10+9) – (5  8)| = 0 𝑘 − 𝑗 × ത 𝑗 d i = | σ 𝑘=1 𝑡 𝑌 |. T =max(6, 1, 3, 1, 0) = 6. 3. T = max ( d 1 ,…, d L ). NIST RBG WORKSHOP, May 2016 9

2. Number of Directional Runs Based on the number of runs Example: constructed using the relations between Let S = (2, 2, 2, 5, 7, 7, 9, 3, 1, 4, 4); consecutive samples. 𝑇 ′ = (+1, +1, +1, +1, +1, +1,  1,  1, +1, +1). Pseudocode: There are three runs: 1. Construct 𝑇 ′ = ( 𝑡 1 ′ ,…, 𝑡 𝑀−1 ′ (+1, +1, +1, +1, +1, +1), (  1,  1) and ), where (+1, +1). ′ = ቊ−1, if 𝑡 𝑗 > 𝑡 𝑗+1 𝑡 𝑗 T = 3. +1, if 𝑡 𝑗 ≤ 𝑡 𝑗+1 for i = 1, …, L – 1. 2. T = # runs in 𝑇 ′ . Binary data: Apply Conversion I . NIST RBG WORKSHOP, May 2016 10

3. Length of Directional Runs Example: Based on the length of the longest run constructed using the relations between Let S = (2, 2, 2, 5, 7, 7, 9, 3, 1, 4, 4). consecutive samples. S ′ = (+1, +1, +1, +1, +1, +1,  1,  1, +1, +1). There are three runs: Pseudocode: (+1, +1, +1, +1, +1, +1), (  1,  1) and (+1, 1. Construct 𝑇 ′ = ( 𝑡 1 ′ , … , 𝑡 𝑀−1 ′ ), where +1) ′ = ቊ−1, if 𝑡 𝑗 > 𝑡 𝑗+1 Longest run has length T = 6. 𝑡 𝑗 +1, if 𝑡 𝑗 ≤ 𝑡 𝑗+1 for i =1, …, L -1. 2. T = length of the longest run in 𝑇 ′ . Binary data: Apply Conversion I. NIST RBG WORKSHOP, May 2016 11

4. Number of Increases and Decreases Based on the maximum number of Example: increases or decreases between Let S = (2, 2, 2, 5, 7, 7, 9, 3, 1, 4, 4). consecutive sample values. S ′ = (+1, +1, +1, +1, +1, +1,  1,  1, +1, +1). There are eight +1’s and two  1’s in S ′ , Pseudocode: 1. Construct 𝑇 ′ = ( 𝑡 1 ′ , … , 𝑡 𝑀−1 ′ ), where T = max (8, 2) = 8. ′ = ቊ−1, if 𝑡 𝑗 > 𝑡 𝑗+1 𝑡 𝑗 +1, if 𝑡 𝑗 ≤ 𝑡 𝑗+1 for i = 1, …, L -1. 2. T = max (number of - 1’s in 𝑇 ′ , number of +1’s in 𝑇 ′ ). Binary data: Apply Conversion I. NIST RBG WORKSHOP, May 2016 12

5. Number of Runs Based on the Median Based on the number of runs that are Example: constructed with respect to the median Let S = (5, 15, 12, 1, 13, 9, 4). of the input data. The median is 9. 𝑇 ′ = ( – 1, +1, +1, – 1, +1, +1, – 1). Pseudocode: There are five runs: ( – 1), (+1, +1), ( – 1), 1. Find the median ෨ 𝑌 of S . (+1, +1), and ( – 1). 2. Construct 𝑇 ′ = ( 𝑡 1 ′ , … , 𝑡 𝑀 ′ ) where T = 5 if 𝑡 𝑗 < ෨ ′ = ൝−1, 𝑌 𝑡 𝑗 if 𝑡 𝑗 ≥ ෨ +1, 𝑌 for i =1, …, L . 3. T = # runs in 𝑇 ′ . Binary data: The median is assumed to be 0.5. NIST RBG WORKSHOP, May 2016 13

6. Length of Runs Based on Median Based on the length of the longest run Example: that is constructed with respect to the Let S = (5, 15, 12, 1, 13, 9, 4). median of the input data. The median is 9. S ' = ( – 1, +1, +1, – 1, +1, +1, – 1). Pseudocode: Runs: ( – 1), (+1, +1), ( – 1), (+1, +1), and 1.Find the median ෨ 𝑌 of S = ( s 1 , …, s L ). ( – 1). 2.Construct 𝑇 ′ = ( 𝑡 1 ′ , … , 𝑡 𝑀 ′ ) The length of longest run is 2; T =2. if 𝑡 𝑗 < ෨ ′ = ൝−1, 𝑌 𝑡 𝑗 if 𝑡 𝑗 ≥ ෨ +1, 𝑌 for i = 1, …, L . 3. T = length of the longest run 𝑇 ′ . Binary data: The median of the input data is assumed to be 0.5. NIST RBG WORKSHOP, May 2016 14

7. Average Collision Test Statistics Based on the number of successive Example: sample values until a duplicate is found. Let S = (2, 1, 1, 2, 0, 1, 0, 1, 1, 2) . The first collision occurs for j = 3 . Add Pseudocode: 3 to C. 1. C is an empty list. i = 1. In remaining sequence (2, 0, 1, 0, 1, 1, 2. While i < L , 2) , next collision occurs for j = 4 . Add 4 to C . Find the smallest j such that ( s i ,…, s i+j -1 ) contains two identical values. If The third sequence is (1,1,2) , and j = 2 . no such j exists, break. Add j to the list C . C = [3,4,2] . The average is 3, T = 3. i = i + j + 1 3. T = average of all values in C . Binary data: Apply Conversion II. NIST RBG WORKSHOP, May 2016 15

8. Maximum Collision Test Statistics Based on the number of successive Example: sample values until a duplicate is found. Let S= (2, 1, 1, 2, 0, 1, 0, 1, 1, 2). C = [3,4,2] is computed as in previous Pseudocode: example. 1. C is an empty list. i = 1 T = max(3,4,2) = 4 3. While i < L Find the smallest j such that ( s i ,…, s i+j -1 ) contains two identical values. If no such j exists, break. Add j to the list C . i = i + j + 1 4. T = the maximum value in the list C. Binary data: Apply Conversion II. NIST RBG WORKSHOP, May 2016 16

Random Bit Generation Workshop 2016 National Institute of Standards - PowerPoint PPT Presentation

Meltem Sonmez Turan meltem.turan@nist.gov Random Bit Generation Workshop 2016 National Institute of Standards and Technology What is the IID Assumption? Critical assumption in statistics, machine learning theory, entropy estimation, etc. In

Listing Bit Strings List all bit strings of length 3. Listing Bit Strings List all bit strings

Lecture 13 : Lecture 13 : Special Bit Instructions Todays Goals L Learn bit-set and

Objectives Random Bit Generation Pseudorandom Bit Generation Statistical Tests

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Bit Basics Eric McCreath Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit

https://bit.ly/3pptcRS 3 4 https://bit.ly/2UiBgWq Vase Face Face https://bit.ly/3luge2Q

Stochastic geometry and random generation 1 Stochastic geometry and random generation

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Orbis UUID Generation, using Consistent Hashing in Erlang UUID [42-bit Timestamp, 12-bit Shard,

Generation of Non-Uniform Random Numbers Generation of Non-Uniform Random Numbers Refs: Chapter 8

The MIPS instruction set architecture The MIPS has a 32 bit architecture, with 32 bit

Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit is normally group with

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Random generation of combinatorial structures Uniform random maps and graphs on surfaces using

Stochastic Simulation Random number generation Bo Friis Nielsen Applied Mathematics and Computer

Stochastic Simulation Random number generation Bo Friis Nielsen Applied Mathematics and Computer

Random Eigenvalue Problem for Linear Dynamic Systems S. A DHIKARI Cambridge University

Random Eigenvalue Problems in Structural Dynamics S ONDIPON A DHIKARI Department of Aerospace

The use of work flow topology observables in a Security

VIRTUAL REALITY IN REALITY: Incorporating VR into Architecture BRENT ARNOLD Interior Designer

Lecture #15: Regression Trees & Random Forests Data Science 1 CS 109A, STAT 121A, AC 209A,

Markov Random Fields Umamahesh Srinivas iPAL Group Meeting February 25, 2011 Outline Basic

Constructing dependent random probability measures from completely random measures Changyou Chen 1

s

Random Bit Generation Workshop 2016 National Institute of Standards - PowerPoint PPT Presentation

Meltem Sonmez Turan meltem.turan@nist.gov Random Bit Generation Workshop 2016 National Institute of Standards and Technology What is the IID Assumption? Critical assumption in statistics, machine learning theory, entropy estimation, etc. In

Listing Bit Strings List all bit strings of length 3. Listing Bit Strings List all bit strings

Lecture 13 : Lecture 13 : Special Bit Instructions Todays Goals L Learn bit-set and

Objectives Random Bit Generation Pseudorandom Bit Generation Statistical Tests

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Bit Basics Eric McCreath Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit

https://bit.ly/3pptcRS 3 4 https://bit.ly/2UiBgWq Vase Face Face https://bit.ly/3luge2Q

Stochastic geometry and random generation 1 Stochastic geometry and random generation

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Orbis UUID Generation, using Consistent Hashing in Erlang UUID [42-bit Timestamp, 12-bit Shard,

Generation of Non-Uniform Random Numbers Generation of Non-Uniform Random Numbers Refs: Chapter 8

The MIPS instruction set architecture The MIPS has a 32 bit architecture, with 32 bit

Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit is normally group with

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Random generation of combinatorial structures Uniform random maps and graphs on surfaces using

Stochastic Simulation Random number generation Bo Friis Nielsen Applied Mathematics and Computer

Stochastic Simulation Random number generation Bo Friis Nielsen Applied Mathematics and Computer

Random Eigenvalue Problem for Linear Dynamic Systems S. A DHIKARI Cambridge University

Random Eigenvalue Problems in Structural Dynamics S ONDIPON A DHIKARI Department of Aerospace

The use of work flow topology observables in a Security

VIRTUAL REALITY IN REALITY: Incorporating VR into Architecture BRENT ARNOLD Interior Designer

Lecture #15: Regression Trees &amp; Random Forests Data Science 1 CS 109A, STAT 121A, AC 209A,

Markov Random Fields Umamahesh Srinivas iPAL Group Meeting February 25, 2011 Outline Basic

Constructing dependent random probability measures from completely random measures Changyou Chen 1

s

Lecture #15: Regression Trees & Random Forests Data Science 1 CS 109A, STAT 121A, AC 209A,