A Galois Field Arithmetic Library Pakize S ANAL, MSc Candidate - PowerPoint PPT Presentation

A Galois Field Arithmetic Library Pakize S ¸ANAL, MSc Candidate Supervisor: Asst. Prof. H¨ useyin HIS ¸IL Yasar University Faculty of Engineering Department of Computer Engineering June 5, 2017 1

Outline Content of the bachelor thesis Studied assembly optimizations Test results 2

Content of the bachelor thesis A Galois Field Arithmetic Library ◮ + , − , ∗ . ◮ GF (2 w − c ) where w = 127 , 128 , 255 , 256 and GF (2 127 − 1). ◮ Constant time AMD64 Assembly. ◮ Extensive validation and performance tests. 3

1. By scheduling of the operations Four digits schoolbook vs. one level recursive schoolbook multiplication vs. . . . a 1 a 0 a 3 a 2 SCB RSCB OSCB b 3 b 2 b 1 b 0 2 256 − c 38 - - x a 0 · b 0 a 1 · b 0 a 2 · b 0 a 3 · b 0 a 0 · b 1 a 1 · b 1 a 2 · b 1 a 3 · b 1 a 0 · b 2 a 1 · b 2 a 2 · b 2 a 3 · b 2 a 0 · b 3 a 1 · b 3 a 2 · b 3 a 3 · b 3 + a · b 4

1. By scheduling of the operations Four digits schoolbook vs. one level recursive schoolbook multiplication vs. . . . SCB RSCB OSCB 2 256 − c 38 35 - a 3 a 2 a 1 a 0 b 3 b 2 b 1 b 0 x a 3 · b 3 a 1 · b 1 a 0 · b 0 a 2 · b 2 a 3 · b 2 a 1 · b 0 a 2 · b 3 a 0 · b 1 a 3 · b 1 a 2 · b 0 a 3 · b 0 a 2 · b 1 a 1 · b 3 a 0 · b 2 a 1 · b 2 a 0 · b 3 + a · b 4

1. By scheduling of the operations Four digits schoolbook vs. one level recursive schoolbook multiplication vs. . . . SCB RSCB OSCB 2 256 − c 38 35 37 a 3 a 2 a 1 a 0 b 0 b 3 b 2 b 1 x a 3 · b 3 a 2 · b 2 a 1 · b 1 a 0 · b 0 a 3 · b 2 a 3 · b 0 a 1 · b 0 a 2 · b 3 a 2 · b 1 a 0 · b 1 a 3 · b 1 a 2 · b 0 a 1 · b 3 a 0 · b 2 a 1 · b 2 a 0 · b 3 + a · b 4

1. By scheduling of the operations One level Karatsuba multiplication vs. one level schoolbook multiplication Karatsuba SCB 2 127 − 1 a 1 a 0 12 6 2 127 − c 17 13 2 128 − c 12 10 b 1 b 0 x a 1 · b 1 a 0 · b 0 ( a 1 + a 0 ) · ( b 1 + b 0 ) a 1 · b 1 - a 0 · b 0 - + a · b 5

2. By making optimization Register optimization // ... 1 movq 8*0( %r8), %rax 2 mulq 8*0( %r9) 3 movq %rax , %rbx 4 movq %rdx , %rsi 5 movq 8*1( %r8), %rax 6 mulq 8*1( %r9) 7 a 3 a 2 a 1 a 0 movq %rax , %r10 8 b 3 b 2 b 1 b 0 movq %rdx , %r11 9 x a 3 · b 3 a 2 · b 2 a 1 · b 1 a 0 · b 0 movq 8*1( %r8), %rax 10 a 3 · b 2 a 1 · b 0 mulq 8*0( %r9) 11 a 2 · b 3 a 0 · b 1 addq %rax , %rsi 12 a 3 · b 1 a 2 · b 0 adcq %rdx , %r10 13 a 3 · b 0 adcq $0 , %r11 a 2 · b 1 14 a 1 · b 3 a 0 · b 2 movq 8*0( %r8), %rax 15 a 1 · b 2 mulq 8*1( %r9) 16 a 0 · b 3 addq %rax , %rsi 17 + a · b adcq %rdx , %r10 18 adcq $0 , %r11 19 movq %rbx , 8*0( %rdi) 20 movq %rsi , 8*1( %rdi) 21 // ... 22 Listing 1 : < GF (2 255 − c ) , ∗ > 6

3. By using special instructions The instruction cmovxx if r 13 = 0 then if r 12 = 0 then Return 0. Return 0. Conditional Move else else Return r 14 . Return r 15 . // ... end end 1 movq %r12 , %rax 2 mulq %r14 3 r 13 r 12 movq $0 , %rbp 4 cmp $0 , %r13 5 cmovz %rbp , %r14 6 r 15 r 14 cmp $0 , %r15 7 x cmovz %rbp , %r12 8 a 12 · b 14 andq %r13 , %r15 9 addq %r12 , %rdx 10 r 13 .r 14 adcq $0 , %rbp 11 addq %r14 , %rdx 12 r 12 · r 15 adcq %r15 , %rbp 13 // ... 14 ? + Listing 2 : < GF (2 128 − c ) , ∗ > a · b 7

3. By using special instructions The instruction btxx Bit Test and Reset // ... 1 /*r11 , r10 , r9 , r8 */ 2 shlq $1 , %r11 3 btrq $63 , %r10 4 adcq $0 , %r11 5 r 11 r 10 r 9 r 8 shlq $1 , %r10 6 btrq $63 , %r9 7 r 9 r 8 adcq $0 , %r10 8 r 11 r 10 9 + addq %r8 , %r10 10 r 11 r 10 adcq %r9 , %r11 11 12 r 11 r 10 btrq $63 , %r11 13 adcq $0 , %r10 14 + adcq $0 , %r11 15 r 11 r 10 // ... 16 Listing 3 : < GF (2 127 − 1) , ∗ > Faster compact Diffie-Hellman: Endomorphisms on the x − line C. Costello, H. Hisil, and B. Smith 8

3. By using special instructions Comparing with the MPFQ library < GF (2 127 − 1) , ∗ > 45 instructions, 9 clock cycles 33 instructions, 6 clock cyles // ... /* r11 , r10 , r9 , r8*/ 1 movq $9223372036854775807 , %rax 2 // ... 1 movq %r9 , %r12 3 /*r11 , r10 , r9 , r8 */ 2 andq %rax , %r9 4 shlq $1 , %r11 3 shrq $63 , %r12 5 btrq $63 , %r10 4 movq %r10 , %rdx 6 adcq $0 , %r11 5 shlq $1 , %r10 7 shlq $1 , %r10 6 orq %r10 , %r12 8 btrq $63 , %r9 7 shlq $1 , %r11 9 adcq $0 , %r10 8 shrq $63 , %rdx 10 9 orq %r11 , %rdx 11 addq %r8 , %r10 10 addq %r12 , %r8 12 adcq %r9 , %r11 11 adcq %rdx , %r9 13 12 movq %r9 , %r12 14 btrq $63 , %r11 13 andq %rax , %r9 15 adcq $0 , %r10 14 shlq $1 , %r12 16 adcq $0 , %r11 15 adcq $0 , %r8 17 // ... 16 adcq $0 , %r9 18 // ... 19 Listing 4 : My schoolbook’s code reduction part Listing 5 : MPFQ schoolbook’s code reduction part https://www.imsc.res.in/~ecc14/slides/hisil.pdf 9

Test Results Timing benchmarks were taken on an Intel Core i7-6500U processor running Ubuntu 14.04.5 LTS with TurboBoost disabled and all cores but one are switched-off (i.e. hyperthreading is disabled). To obtain the executables, we used GNU- gcc version 4.8.4 with the -O2 flag set and GNU assembler version 2.24. Karatsuba Schoolbook (SCB) Recursive SCB 2 127 − 1 12 6 - 2 127 − c 17 13 - 2 128 − c 12 10 - 2 255 − c - 46 40 2 256 − c - 38 34 10

1 / ∗ l i b r a r i e s ∗ / 2 #d e f i n e TRIAL 100000000000 3 i n t main () { 4 l on g l on g st , fn ; 5 s t = c p u c y c l e s () ; 6 u n si gn e d l on g an [ 2 ] , bn [ 2 ] , cn [ 2 ] ; 7 an [ 0 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 8 an [ 1 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 9 bn [ 0 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 10 bn [ 1 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 11 cn [ 0 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 12 cn [ 1 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 13 u n si gn e d l on g i n t i ; 14 f o r ( i = 0; i < TRIAL ; i ++) { 15 mul127 scb v01 ( an , bn , cn ) ; 16 an [ 0 ] = bn [ 1 ] ; 17 an [ 1 ] = cn [ 0 ] ; 18 bn [ 0 ] = an [ 1 ] ; 19 bn [ 1 ] = cn [ 1 ] ; 20 cn [ 0 ] = an [ 1 ] ; 21 cn [ 1 ] = bn [ 0 ] ; 22 } 23 fn = c p u c y c l e s () ; 24 double f i r s t = (( double ) fn − s t ) / TRIAL ; 25 s t = c p u c y c l e s () ; 26 f o r ( i = 0; i < TRIAL ; i ++) { 27 mu l 127 sc b te st ( an , bn , cn ) ; 28 an [ 0 ] = bn [ 1 ] ; 29 an [ 1 ] = cn [ 0 ] ; 30 bn [ 0 ] = an [ 1 ] ; 31 bn [ 1 ] = cn [ 1 ] ; 32 cn [ 0 ] = an [ 1 ] ; 33 cn [ 1 ] = bn [ 0 ] ; 34 } 35 fn = c p u c y c l e s () ; 36 double second = (( double ) fn − s t ) / TRIAL ; 37 p r i n t f (” net c l oc k c y c l e : %l f \ n \ n” , f i r s t − second ) ; 38 r e t u r n 1; 39 } Listing 6 : A performance test 11

A Galois Field Arithmetic Library Pakize S ANAL, MSc Candidate - PowerPoint PPT Presentation

A Galois Field Arithmetic Library Pakize S ANAL, MSc Candidate Supervisor: Asst. Prof. H useyin HIS IL Yasar University Faculty of Engineering Department of Computer Engineering June 5, 2017 1 Outline Content of the bachelor thesis

Hopf-Galois Theory and Galois Module Structure University of Exeter. Induced Hopf Galois

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

On Galois Cohomology, Norm Functions and Cycles Markus Rost Bielefeld, September 2006 Galois

:i extensions ) characterizations for Galois Thin ( Equivalent - Gal ( EIF ) . Then and

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

Digital Design Discussion: Arithmetic Binary Arithmetic Floating-Point Arithmetic Binary

Saltmans Generic Galois Extensions and Problems in Field Theory David Harbater May 16, 2011

Commutative nilpotent rings and Hopf Galois structures Lindsay Childs Exeter, June, 2015

II Relative : tasting Galois computation of An a explicit group Newstead E ' FEKEE , how

L-relations and Galois triangles Basic notions Adjoint product Symmetry Commutativity and

":::i too big ) aren't CI ( Galois groups [ E : F) 31gal ( EIHL i

Geometria Alg ebrica I lecture 16: Galois coverings and Galois categories Misha Verbitsky

The Galois Complexity of Graph Drawing Michael J. Bannister William E. Devanny David Eppstein

Galois connections between group actions and functions some results and problems Reinhard P

Residual modular Galois representations and their images Samuele Anni University of Warwick

Section 6: Field and Galois theory Matthew Macauley Department of Mathematical Sciences Clemson

A bottom-up efficient algorithm learning substitutable languages from positive examples Fran

Computer Networks M Goals, Basics, and Models Antonio Corradi Academic year 2015/2016

and Access the CEWS Erin R. Kuzz Shana French Kyle B. Lamothe Ian Humphries Phone:

Capturing Light Rooms by the Sea, Edward Hopper, 1951 The Penitent Magdalen, Georges de La Tour,

R. Inkulu http://www.iitg.ac.in/rinkulu/ 1 only very essential notions are covered (Asymptotic

Complexity of isomorphism relations Andr e Nies Univ. of Auckland DSTMT 2013, Kolkata Andr

Lecture 3: Interest Rate Forwards and Options Nattawut Jenwittayaroje, Ph.D., CFA 01135532:

jQuery jQuery CS 380: Web Programming CS 380 1 Downloading and using jQuery UI <script

A Galois Field Arithmetic Library Pakize S ANAL, MSc Candidate - PowerPoint PPT Presentation

A Galois Field Arithmetic Library Pakize S ANAL, MSc Candidate Supervisor: Asst. Prof. H useyin HIS IL Yasar University Faculty of Engineering Department of Computer Engineering June 5, 2017 1 Outline Content of the bachelor thesis

Hopf-Galois Theory and Galois Module Structure University of Exeter. Induced Hopf Galois

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

On Galois Cohomology, Norm Functions and Cycles Markus Rost Bielefeld, September 2006 Galois

:i extensions ) characterizations for Galois Thin ( Equivalent - Gal ( EIF ) . Then and

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

Digital Design Discussion: Arithmetic Binary Arithmetic Floating-Point Arithmetic Binary

Saltmans Generic Galois Extensions and Problems in Field Theory David Harbater May 16, 2011

Commutative nilpotent rings and Hopf Galois structures Lindsay Childs Exeter, June, 2015

II Relative : tasting Galois computation of An a explicit group Newstead E ' FEKEE , how

L-relations and Galois triangles Basic notions Adjoint product Symmetry Commutativity and

&quot;:::i too big ) aren't CI ( Galois groups [ E : F) 31gal ( EIHL i

Geometria Alg ebrica I lecture 16: Galois coverings and Galois categories Misha Verbitsky

The Galois Complexity of Graph Drawing Michael J. Bannister William E. Devanny David Eppstein

Galois connections between group actions and functions some results and problems Reinhard P

Residual modular Galois representations and their images Samuele Anni University of Warwick

Section 6: Field and Galois theory Matthew Macauley Department of Mathematical Sciences Clemson

A bottom-up efficient algorithm learning substitutable languages from positive examples Fran

Computer Networks M Goals, Basics, and Models Antonio Corradi Academic year 2015/2016

and Access the CEWS Erin R. Kuzz Shana French Kyle B. Lamothe Ian Humphries Phone:

Capturing Light Rooms by the Sea, Edward Hopper, 1951 The Penitent Magdalen, Georges de La Tour,

R. Inkulu http://www.iitg.ac.in/rinkulu/ 1 only very essential notions are covered (Asymptotic

Complexity of isomorphism relations Andr e Nies Univ. of Auckland DSTMT 2013, Kolkata Andr

Lecture 3: Interest Rate Forwards and Options Nattawut Jenwittayaroje, Ph.D., CFA 01135532:

jQuery jQuery CS 380: Web Programming CS 380 1 Downloading and using jQuery UI &lt;script

":::i too big ) aren't CI ( Galois groups [ E : F) 31gal ( EIHL i

jQuery jQuery CS 380: Web Programming CS 380 1 Downloading and using jQuery UI <script