Generating Massive Amount of Generating Massive Amount of High- -Quality Random Numbers using GPU Quality Random Numbers using GPU High Wai-Man Pang, Tien-Tsin Wong, Pheng-Ann Heng The Computer Science and Engineering Department The Chinese University of Hong Kong IEEE WCCI CIGPU 2008
Pseudo- -random number generator random number generator Pseudo (PRNG) (PRNG) � Provide uniform random numbers Provide uniform random numbers � � Example : rand() in C Example : rand() in C � � Important for stochastic algorithms Important for stochastic algorithms � � Evolutionary Computing Evolutionary Computing � � Photon Photon- -mapping rendering mapping rendering � � Huge Amount Huge Amount � � Speed Speed � � Quality Quality � � slow convergence Poor randomness � � Poor randomness slow convergence �
PRNG for Stochastic Rendering PRNG for Stochastic Rendering � Artifact for poor quality PRNG Artifact for poor quality PRNG �
PRNG for Stochastic Rendering PRNG for Stochastic Rendering � From High quality PRNG From High quality PRNG �
Some common PRNG Some common PRNG � linear linear congruential congruential generator (LCG) generator (LCG) � � R R n+1 = aR aR n + b (mod m) n+1 = n + b (mod m) � � lagged Fibonacci generator lagged Fibonacci generator � � R R n = R R n # R R n+k (mod m) (where # is a binary n = j # n+k (mod m) (where # is a binary � n- -j operator) operator) � High precision integer arithmetic High precision integer arithmetic � � Cannot fit in all GPU Cannot fit in all GPU �
PRNG on GPU PRNG on GPU � Cellular Automata Cellular Automata- -based PRNG [ based PRNG [Wolfram] � � No high precision integer No high precision integer arithmetics arithmetics � � Homogeneous cell operation and Homogeneous cell operation and � connectivity connectivity � Quality Quality � � Configure to produce high quality random Configure to produce high quality random � sequence sequence
CA- -based PRNG based PRNG CA � Array of connected Array of connected cells cells � with homogeneous behavior with homogeneous behavior Previous state 2 0 14 … 18 values … � Each Cell have a state and Each Cell have a state and � from neighbors a common cell equation a common cell equation � (X) � Cell Equation : Cell Equation : � c i Output state value
Mechanism Mechanism � 4 Cell, Connectivity ( 4 Cell, Connectivity (- -1,2) 1,2) � � Cell Equation : step( 1, 3 Cell Equation : step( 1, 3- - c1 c1- - 2*c2 ) 2*c2 ) � 1 0 0 1 A B C D Cell D: 1 Cell C: 0 1 - Step(1, 3- 1 – 2*0) A
Mechanism (cont’ ’) ) Mechanism (cont random number generated 1 0 1 1 111 A B C D random number generated 0 0 1 1 011 A B C D
GPU Implementation Issue GPU Implementation Issue � Cell resembles Cell resembles texel texel in GPU in GPU � � 64 cells and 4 connected CA PRNG for 32 64 cells and 4 connected CA PRNG for 32- -bits bits � random number random number � Cell equation evaluation Cell equation evaluation � � Fast table lookup Fast table lookup � 4 = 16 possible output � 4 4 connectivities connectivities = 4 input, 2 = 4 input, 2 4 = 16 possible output � � Reorganize bits Reorganize bits � � Bits in a random number is scattered among Bits in a random number is scattered among texels texels � � Output floating point value Output floating point value f f � ( ( ( ) ) ) 2 f = r / 2 + r / 2 + ...... + r / 0 1 31 � r r i is the i i - -th th bit in the random number bit in the random number i is the �
Shader Code Code Shader float4 caprng( in half2 coords: TEX0,in const uniform samplerRECT cells): COLOR0 { float2 Connector; float4 newState; float4 neigborStates[4]; int i; for (i = 0 ; i < 4; i++) { Connector.x = fmod(coords.x -connectivity(i),CA SIZE); Connector.y = coords.y; neigborStates[i] = round(texRECT(cells,Connector)); } // cell equation evaluation newState.x = celleqn(neigborStates); return newState; } float4 pack(in half2 index : TEX0, in const uniform samplerRECT cells): COLOR0 { int i; float4 outbits; float4 states; float2 texindex; outbits = 0; // packing all 32 bits for (i = 0 ; i < 32 ; i++) { texindex.x = i*2+1; texindex.y = index.y; states = texRECT(cells, texindex); outbits += states; outbits /= 2; } return outbits; }
Parallelized PRNG Parallelized PRNG � Fully utilize 4096 Fully utilize 4096 × 4096 texels texels (7800GTX) (7800GTX) × 4096 � � Each cell occupies single bit in Each cell occupies single bit in texel texel � � Why not pack more inside each Why not pack more inside each texel texel ? ? � � Fully utilize the mantissa part of the Fully utilize the mantissa part of the texel texel � � 23 23 × × 4 random sequences simultaneously. 4 random sequences simultaneously. � � Combine 2 schemes : 64 Combine 2 schemes : 64 × 4096 × 92 PRNGs PRNGs × 4096 × 92 � Cells Texture Cells Texture 1 1 0 0 1 0 0 1 1 1 PRNG1: 1 1 0 0 TEX0 TEX1 TEX2 TEX3 TEX0 TEX1 0 1 1 0 PRNG2: 0 1 1 0 0 1 0 0 0 1 TEX4 TEX5 TEX6 TEX7 PRNG3: 0 1 0 1 TEX2 TEX3 0 1 0 1 TEX8 TEX9 TEX10 TEX11 …… ……
Optimize for Quality Optimize for Quality � Genetic Algorithm Genetic Algorithm � � CA base PRNG configuration with best quality CA base PRNG configuration with best quality � � Initialize candidates Initialize candidates � � Encoded cell equation and Encoded cell equation and connectivities connectivities � n + n bits � 2 2 n + n bits � � Evaluate candidates by objective function Evaluate candidates by objective function � � Generate next generation Generate next generation � � Crossover Crossover � � Mutation Mutation � � Repeat until excess certain threshold Repeat until excess certain threshold �
Objective Function Objective Function � Objective function Objective function � objective = w 0 × e + w 1 × � � w w i is the weighting i is the weighting � � e e is the n is the n- -bit entropy bit entropy � � � is the result of Diehard test
Objective Function (cont’ ’) ) Objective Function (cont � Diehard test Diehard test � � 14 tests (e.g. birthday spacing, GDC test, etc.) 14 tests (e.g. birthday spacing, GDC test, etc.) � � Chi Chi- -square square � � Overall p Overall p- -value value � � Chi Chi- -square test on all p square test on all p- -values with Gaussian distribution values with Gaussian distribution � � Best 4 connected, 64 Cells CA PRNG Best 4 connected, 64 Cells CA PRNG � � Connectivity (56,2,21,49) Connectivity (56,2,21,49) � � Cell equation in tightly packed format Cell equation in tightly packed format � (1001100110100101) (1001100110100101)
Convergence Convergence Generation 1 Generation 2 Control e=0.2673 � =0.0 e=0.5852 � =0.0 10,000 photons Generation 8 Generation 11 Generation 4 e=0.5944 � =0.0 e=0.9464 � =0.143 e=0.9514 � =0.3513
Performance Performance � Performance compare with CPU Performance compare with CPU � � Single PRNG Single PRNG � � 1,000 Parallel PRNG 1,000 Parallel PRNG � GPU CA-PRNG GPU CA-PRNG Random numbers Random numbers Software CA- Software CA- generated generated PRNG PRNG 10,000 0.004s 0.043s 1,000 0.064s 0.004s 100,000 0.031s 0.425s 10,000 0.942s 0.042s 1,000,000 0.31s 4.274s 100,000 10.081s 0.391s 10,000,000 3.098s 43.003s 1,000,000 100.082s 4.163s 100,000,000 31.875s 430s
Conclusion Conclusion � CA architecture PRNG is highly suitable CA architecture PRNG is highly suitable � for GPU for GPU � Parallel PRNG on GPU Parallel PRNG on GPU � � Optimization for quality Optimization for quality � � A high quality and high performance gain A high quality and high performance gain � � Future works Future works � � Support of variable precision random Support of variable precision random � sequence sequence � Experiment with Evolution Computing Experiment with Evolution Computing � applications applications
End End Thanks for your attention Thanks for your attention
� Reference : Reference : � � " "Implementating Implementating High High- -Quality PRNG on GPU", Quality PRNG on GPU", � W. M. Pang, T. T. Wong and P. A. Heng W. M. Pang, T. T. Wong and P. A. Heng, , Shader X5: Advanced Rendering Techniques, Edited by W. Shader X5: Advanced Rendering Techniques, Edited by W. Engel, Charles River Media, 2007, pp. 579- Engel, Charles River Media, 2007, pp. 579 -590. 590.
Recommend
More recommend