Speeding up characteristic 2: I. Linear maps M ( n ) game II. The - PDF document

Speeding up characteristic 2: I. Linear maps M ( n ) game II. The III. Batching IV. Normal bases D. J. Bernstein University of Illinois at Chicago NSF ITR–0716498

Part I. Linear maps Consider computing h 0 = q 0 ; h 1 = q 1 ; h 2 = q 2 � ( p 0 � q 0 � r 0 ); h 3 = ( p 1 � q 1 � r 1 ); h 4 = ( p 2 � q 2 � r 2 ) � r 0 ; h 5 = r 1 ; h 6 = r 2 . Easy: 8 additions. Can find these 8 additions in several papers. But 8 is not optimal!

“Wasting brain power is bad for the environment.” Use existing algorithms to find addition chains. Apply, e.g., greedy additive CSE algorithm from 1997 Paar: � find input pair i 0 ; i 1 i 0 � i 1 ; with most popular � compute i 0 � i 1 ; � simplify using i 0 � i 1 ; � repeat. This algorithm finds repeated q 2 � r 0 ; uses 7 additions.

A new algorithm: “xor largest.” Start with the matrix mod 2 for the desired linear map. If two largest rows have same first bit, replace largest row by its xor with second-largest row. Otherwise change largest row by clearing first bit. In both cases, compute result recursively, and finish with one xor.

A small example: x 0 + x 2 + x 3 1011 = x 0 + x 1 + x 2 + x 3 1111 = x 1 + x 2 0110 = x 1 + x 3 0101 = Replace largest row by its xor with second-largest row.

Recursively compute x 0 + x 2 + x 3 1011 = x 1 0100 = x 1 + x 2 0110 = x 1 + x 3 0101 = plus 1 xor of first output into second output.

Recursively compute 0011 0100 0110 0101 plus 1 input load, 2 xors.

Recursively compute 0011 0000 0011 0001 plus 2 input loads, 4 xors. Note: this was just a copy.

Recursively compute 0000 0000 0011 0001 plus 2 input loads, 4 xors.

Memory friendliness: Algorithm writes only to the output registers. No temporary storage. n inputs, n outputs: n registers total 2 with 0 loads, 0 stores. n + 1 registers Or n loads, 0 stores: with each input is read only once. n registers Or n loads, 0 stores, with if platform has load-xor insn.

Two-operand friendliness: a a � b Platform with a b � but without n extra copies. uses only Naive column sweep also uses n + 1 registers, n loads, but usually many more xors. Input partitioning (e.g., 1956 Lupanov) uses somewhat more xors, copies; somewhat more registers. Greedy additive CSE uses somewhat fewer xors but many more copies, registers.

m inputs and n outputs, For n � m matrix: average The xor-largest algorithm uses � mn= lg n two-operand xors; n copies; m loads; n + 1 regs.

m inputs and n outputs, For n � m matrix: average The xor-largest algorithm uses � mn= lg n two-operand xors; n copies; m loads; n + 1 regs. Pippenger’s algorithm uses � mn= lg mn three-operand xors but seems to need many regs. Pippenger proved that his algebraic complexity was near optimal for most matrices (at least without mod 2), but didn’t consider regs, two-operand complexity, etc.

Our original example: 000100000 000010000 100101100 010010010 001001101 000000010 000000001 Each row has coefficients of p 0 ; p 1 ; p 2 ; q 0 ; q 1 ; q 2 ; r 0 ; r 1 ; r 2 .

Our original example: 000100000 000010000 000101100 010010010 001001101 000000010 000000001 plus 1 xor, 1 input load.

Our original example: 000100000 000010000 000101100 000010010 001001101 000000010 000000001 plus 2 xors, 2 input loads.

Our original example: 000000000 000000000 000000000 000000000 000000000 000000000 000000000 plus 7 xors, 9 input loads. Algorithm found the speedup.

M ( n ) game Part II. The M ( n ) Define as the minimum number of bit operations (ands, xors) needed to multiply n -bit polys f ; g 2 F 2 [ x ] (in standard representation). M (2) � 5: e.g. to compute h 0 + h 1 x + h 2 x 2 = f 0 + f 1 x )( g 0 + g 1 x ) ( h 0 = f 0 g 0 , can compute h 1 = f 0 g 1 + f 1 g 0 , h 2 = f 1 g 1 with 4 ands, 1 xor.

Schoolbook multiplication: M ( n ) � Θ( n 2 ). 1963 Karatsuba: M ( n ) � Θ( n lg 3 ). p 1963 Toom: n ) . M ( n ) � n 2 Θ( lg 1971 Sch¨ onhage–Strassen: M ( n ) � Θ( n lg n lg lg n ). 2007 F¨ urer n for integers improves lg lg but doesn’t help mod 2.

What does this tell us M (131) or M (251)? about Absolutely nothing! Reanalyze algorithms to see exact complexity. Rethink algorithm design to find constant-factor (and sub-constant-factor) speedups that are not visible in the asymptotics.

Schoolbook recursion: M ( n + 1) � M ( n ) + 4 n . M ( n ) � 2 n 2 � 2 n + 1. Hence Karatsuba recursion as commonly stated: M (2 n ) � 3 M ( n ) + 8 n � 4. n = 1: e.g. Karatsuba for f = f 0 + f 1 x , g = g 0 + g 1 x , h 0 = f 0 g 0 , h 2 = f 1 g 1 , h 1 = ( f 0 + f 1 )( g 0 + g 1 ) � h 0 � h 2 ) f g = h 0 + h 1 x + h 2 x 2 .

n = 2: Karatsuba for f = f 0 + f 1 x + f 2 x 2 + f 3 x 3 , g = g 0 + g 1 x + g 2 x 2 + g 3 x 3 , H 0 = ( f 0 + f 1 x )( g 0 + g 1 x ), H 2 = ( f 2 + f 3 x )( g 2 + g 3 x ), H 1 = ( f 0 + f 2 + ( f 1 + f 3 ) x ) � g 0 + g 2 + ( g 1 + g 3 ) x ) ( � H 0 � H 2 ) f g = H 0 + H 1 x 2 + H 2 x 4 .

Initial linear computation: f 0 + f 2 ; f 1 + f 3 ; g 0 + g 2 ; g 1 + g 3 ; cost 4. Three size-2 mults producing H 0 = q 0 + q 1 x + q 2 x 2 ; H 2 = r 0 + r 1 x + r 2 x 2 ; H 0 + H 1 + H 2 = p 0 + p 1 x + p 2 x 2 . Final linear reconstruction: H 1 = ( p 0 � q 0 � r 0 ) + p 1 � q 1 � r 1 ) x + ( p 2 � q 2 � r 2 ) x 2 , ( cost 6; f g = H 0 + H 1 x 2 + H 2 x 4 , cost 2.

Let’s look more closely at the reconstruction: f g = h 0 + h 1 x + � � � + h 6 x 6 with h 0 = q 0 ; h 1 = q 1 ; h 2 = q 2 + ( p 0 � q 0 � r 0 ); h 3 = ( p 1 � q 1 � r 1 ); h 4 = ( p 2 � q 2 � r 2 ) + r 0 ; h 5 = r 1 ; h 6 = r 2 .

Let’s look more closely at the reconstruction: f g = h 0 + h 1 x + � � � + h 6 x 6 with h 0 = q 0 ; h 1 = q 1 ; h 2 = q 2 + ( p 0 � q 0 � r 0 ); h 3 = ( p 1 � q 1 � r 1 ); h 4 = ( p 2 � q 2 � r 2 ) + r 0 ; h 5 = r 1 ; h 6 = r 2 . We’ve seen this before! Reduce 6 + 2 = 8 ops to 7 ops q 2 � r 0 . by reusing

2000 Bernstein: M (2 n ) � 3 M ( n ) + 7 n � 3. 2009 Bernstein: M ( n ) new bounds on from further improvements to Karatsuba, Toom, etc. binary.cr.yp.to/m.html Typically 20% smaller than 2003 Rodr´ ıguez-Henr´ ıquez–Ko¸ c, 2005 Chang–Kim–Park–Lim, 2006 Weimerskirch–Paar, 2006 von zur Gathen–Shokrollahi, 2007 Peter–Langend¨ orfer.

So far have focused on M ( n ) for small n , but different techniques n . are better for large I’m now exploring impact of 2008 Gao–Mateer. � F � k : q For F 2 1988 Wang–Zhu, 1989 Cantor q + k [ t ] = ( t t ) using diagonalize � 0 : 5 q lg q mults in k , � 0 : 5 q (lg q ) lg 3 adds in k . 2008 Gao–Mateer use � 0 : 5 q lg q mults, � 0 : 25 q lg q lg lg q adds.

“Who cares?” Conventional wisdom: M ( n ) analysis Detailed has very little relevance to software speed. f by g We multiply f by looking up 4 bits of in a size-16 table of g ; precomputed multiples of looking up next 4 bits; etc. One table lookup replaces many bit operations! Might use Karatsuba etc., n . but only for large

Speeding up characteristic 2: I. Linear maps M ( n ) game II. The - PDF document

Speeding up characteristic 2: I. Linear maps M ( n ) game II. The III. Batching IV. Normal bases D. J. Bernstein University of Illinois at Chicago NSF ITR0716498 Part I. Linear maps Consider computing h 0 = q 0 ; h 1 = q 1 ; h 2 = q 2

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

GUI Testing Chapter 19 GUI characteristic Figure 19.1 What is the main characteristic of

Singularities in characteristic zero and singularities in characteristic p Karl Schwede 1 1

Characteristic Functions Will Perkins February 14, 2013 Characteristic Functions Definition The

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

An enumerative relationship between maps and 4-regular maps Michael La Croix April 9, 2008 An

18.175: Lecture 15 Characteristic functions and central limit theorem Scott Sheffield MIT 1 18.175

Speeding up the Inter-Planetary File System (IPFS) Speeding up the Inter-Planetary File System

Speeding Up Your Mac A Joe ON Tech Guide Speeding Up Your Mac Basics Three factors affect

Braided surfaces and their characteristic maps Louis Funar (joint work with Pablo Pagotto) K-OS

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Dynamical systems Expanding maps on the circle Jana Rodriguez Hertz ICTP 2018 lifts and degree

APPENDICES appendix 1. Systems maps appendix 1. Systems maps appendix 1. Systems maps appendix

For Friday Finish Weiss, chapter 3 The concepts here should be review. If youre

Anomalies in Cosmic Ray Composition: Explanation Based on Mass to Charge Ratio Adrian Hanusch ,

Status of the BI Work Package in the LIU-PSB frame Jocelyn TAN, BE-BI Thanks to the contributions

Anisotropic Structures - Theory and Design Strutture anisotrope: teoria e progetto Paolo VANNUCCI

/ 33 1 Cryptanalysis of Reduced round SKINNY Block Cipher Outline A brief description of

Out of Oddity New Cryptanalytic Techniques against Symmetric Primitives Optimized for

Lecture 13 Jeffrey H. Shapiro Optical and Quantum Communications Group www.rle.mit.edu/qoptics

Practical Attack on 8 Rounds of the Lightweight Block Cipher KLEIN Jean-Philippe Aumasson NAGRA,

Speeding up characteristic 2: I. Linear maps M ( n ) game II. The - PDF document

Speeding up characteristic 2: I. Linear maps M ( n ) game II. The III. Batching IV. Normal bases D. J. Bernstein University of Illinois at Chicago NSF ITR0716498 Part I. Linear maps Consider computing h 0 = q 0 ; h 1 = q 1 ; h 2 = q 2

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

GUI Testing Chapter 19 GUI characteristic Figure 19.1 What is the main characteristic of

Singularities in characteristic zero and singularities in characteristic p Karl Schwede 1 1

Characteristic Functions Will Perkins February 14, 2013 Characteristic Functions Definition The

Game interoperability with functors functor AgsFun (structure Game : GAME) :&gt; sig structure

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

An enumerative relationship between maps and 4-regular maps Michael La Croix April 9, 2008 An

18.175: Lecture 15 Characteristic functions and central limit theorem Scott Sheffield MIT 1 18.175

Speeding up the Inter-Planetary File System (IPFS) Speeding up the Inter-Planetary File System

Speeding Up Your Mac A Joe ON Tech Guide Speeding Up Your Mac Basics Three factors affect

Braided surfaces and their characteristic maps Louis Funar (joint work with Pablo Pagotto) K-OS

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Dynamical systems Expanding maps on the circle Jana Rodriguez Hertz ICTP 2018 lifts and degree

APPENDICES appendix 1. Systems maps appendix 1. Systems maps appendix 1. Systems maps appendix

For Friday Finish Weiss, chapter 3 The concepts here should be review. If youre

Anomalies in Cosmic Ray Composition: Explanation Based on Mass to Charge Ratio Adrian Hanusch ,

Status of the BI Work Package in the LIU-PSB frame Jocelyn TAN, BE-BI Thanks to the contributions

Anisotropic Structures - Theory and Design Strutture anisotrope: teoria e progetto Paolo VANNUCCI

/ 33 1 Cryptanalysis of Reduced round SKINNY Block Cipher Outline A brief description of

Out of Oddity New Cryptanalytic Techniques against Symmetric Primitives Optimized for

Lecture 13 Jeffrey H. Shapiro Optical and Quantum Communications Group www.rle.mit.edu/qoptics

Practical Attack on 8 Rounds of the Lightweight Block Cipher KLEIN Jean-Philippe Aumasson NAGRA,

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure