M M adison E adison E mbedded S mbedded S ystems & A ystems - PowerPoint PPT Presentation

M adison M adison E mbedded E mbedded S ystems & S ystems & A rchitectures Laboratory A rchitectures Laboratory M M adison E adison E mbedded S mbedded S ystems & A ystems & A rchitectures Laboratory rchitectures Laboratory (M ESA (M ESA M ESA) M ESA) Department of Electrical and Computer Engineering Department of Electrical Department of Electrical and Computer Engineering Department of Electrical and Computer Engineering and Computer Engineering Decimal Floating-Point Adder and Multifunction Unit with Injection-Based Rounding Liang-Kai Wang and Michael J. Schulte University of Wisconsin-Madison ARITH-18, Montpellier, France This research is supported by the UW-Madision Graduate 1 School and IBM

Outline • Motivation • Related Research • Algorithm for Decimal Floating-Point (DFP) Adder and Multifunction Unit • Hardware Design • Experimental Results and Analysis • Conclusions 2

Motivation • Important in business applications =0.2 10 = 0.00110011… 2 • The IEEE P754 floating-point standard – Three DFP formats: 34-digit decimal128 format, 16-digit decimal64 format (this paper), and 7- digit decimal32 format • Decimal floating-point software is slow • Decreasing transistor costs 3

Previous Research and Proposed Design • Previous designs – Focus on fixed-point addition and subtraction • For example, [Adiletta89], [Schmookler71] – [Thompson04] presents the first IEEE P754 compliant DFP adder • We propose an DFP multifunction unit that – Supports eight DFP operations • add, sub, quantize, sameQuantum, roundToIntegral, minNum, maxNum, and compare – Optimizes significand alignment – Applies decimal injection-based rounding – Uses a decimal flag-tracing mechanism 4

DFP Adder and Multifunction Unit A B SA = sign of A Forward format conversion SB = sign of B EA = exponent of A EB = exponent of B Operand alignment CA = significand of A CB = significand of B Pre-correction Carry propagation network Post-correction Overflow detection Shift and round Backward format conversion 5 S

Operand Alignment E x p o n e n t s ( E A a n d E B ) a n d L e n g t h s o f L e a d i n g Z e r o ( L A a n d • Decimal operands are not L B ) normalized Y E S • Operand alignment calculation S w a p C A a n d C B E A < E B • E.g. LA = 5 , EA – EB = 9 N O N O L e f t S h i f t C A b y L A < | E A - E B | S ( L A - | E A - E B | ) s S P digits Y E S L e f t S h i f t C A b y L A A=CA X 10 EA = X 10 EA X 10 EA-5 S S 0…0 a i-1 … a 0 0 0 0 0 0 a i-1 … a 0 R i g h t S h i f t C B b y S LA m i n ( | E A - E B | - L A , 1 9 ) G R S S B=CB X 10 EB = X 10 EB 0…0 b k-1 … b 0 0………0 b k-1 b 4 b 3 b 2 X 10 EB+4 LB LB Result X 10 EB+4 6

Pre-correction • Effective operation = SA ⊕ SB ⊕ OP • Place operands based on effective operations simplifies result shifting • Inject value into the digit positions, R and S, based on rounding modes replaces rounding by truncation. L G R S xxxx xxxx xxxx xx x x 0 5 0 A Effective add roundTiesToAway 0000 xxxx xxxx xx x x B 0 5 1 result 0 0 1 7

Pre-correction • Injection value Injection Value Sign inj Rounding Mode (R, S) X TowardZero (0, 0) X TieToAway (5, 0) X TieToZero (4, 9) X TieToEven (5, 0) - + ∞ (0, 0) + - ∞ (9, 9) - + ∞ (9, 9) + - ∞ (0, 0) X AwayZero (9, 9) • Operands are corrected to generate correct carry-out ( ) ( ) ⎧ CB ' If EOP = add + ⎧ ⎪ If EOP = add ( ) ⎪ CA ' 6 2 ( ) i 2 = i = ⎨ CB ⎨ CA ( ) ( ) 3 3 i i ⎪ ⎪ Otherwise Otherwise CA ' CB ' ⎩ ⎩ 2 2 i i 8

Carry Propagation Network • Kogge-Stone parallel prefix 19 digits network L G R S Digit 6 5 4 3 2 1 0 18 17 16 15 14 13 12 11 10 9 8 7 Position row 0 • Two sets of flags row 1 Original – Flag F 1 handles row 2 KS Network row 3 the digit row 4 increment in the post-correction carry-out (C 1 ) row 5 flags (F 1 ) stage. sum digits (UCR) row 6 16 digits – Flag F 2 handles Post- row 7 the carry correction Post- correction row 8 16 digits (LSD) propagation from CR 1 row 9 the injection Injection Correction row 10 correction value. Shift and Block F 2 Round Unit Trailing Nine Detection Network carry 9 CR 2

Post-correction • Compensate the result from the K-S network • Rule 1: effective operation is ADD – Subtract 6 from digit i for which (C 1 ) i+1 is 0 • Rule 2: effective operation is SUB – If the result is positive • Increment the result using F 1 • Subtract 6 from digit i for which (C 1 ) i+1 ⊕ (F 1 ) i ≡ 0 – If the result is negative • Invert all bits of the result • Subtract 6 from digit i for which (C 1 ) i+1 ≡ 1 10

Shift and Round • Most significant digit is zero – No action is needed • Most significant digit is non-zero – Requires an injection correction step P = 16 digits L G R S A 0 5 0 Effective add TieToEven B 0 Predicted result + Significand 0 5 0 0 4 5 0 Real result X Right shift 1 digit Exponent increment 11

Shift and Round • Injection correction value for different rounding modes Injection Correction Value Sign inj Rounding Mode (G, R, S) X TowardZero (0, 0, 0) X TieToAway (4, 5, 0) X TieToZero (4, 5, 0) X TieToEven (4, 5, 0) - + ∞ (0, 0, 0) + - ∞ (9, 0, 0) - + ∞ (9, 0, 0) + - ∞ (0, 0, 0) X AwayZero (9, 0, 0) • Injection correction value may trigger carry propagation 12 • Flag F 2 eliminates carry propagation

Comparison Thompson’s Design This Design Supported DFP 2: add, subtract 8: add, subtract, minNum, Operations maxNum, compare, quantize, sameQuantum, roundToIntegral Internal format Excess-3 encoding BCD encoding Operand Exponent computation and Exponent computation and LZD in Alignment LZD in series parallel Carry-propagate Kogge-Stone with flag Two extra flags for rounding network tracing for post-correction Rounding Random logic and decimal Injection-based rounding with incrementer. correction. Overflow After result is rounded Before the result is rounded Detection 13

Extension to Support More DFP Operations • ToIntegralValue(A) – Round A to an integer value • ToIntegralValue(13545 x 10 -3 ) = 14 with round-ties-to-even – Design strategy • Set CB 1 and EB 1 to zero • Enable right shift even if CB 1 =0 • Set effective operation to ADD • Quantize (A, B) – Change EA to EB • Quantize(12345 x 10 -4 , 1 x 10 -2 ) = 123 x 10 -2 with round-down – Design strategy • Set CB 1 to zero • Enable right shift even if CB 1 =0 • Set effective operation to ADD 14

Extension to Support More DFP Operations • SameQuantum(A, B) – Check if EA ≡ EB – Generate an extra flag in the operand alignment stage • minNum, maxNum, and compare use the original datapath • Many changes are made to exception flag logic • A post-processing unit is added to handle special operands such as infinity and Not-a- Number 15

Block Diagram of the DFP Adder and Multifunction Unit Operation CA 1 CA 2 RSA CB 1 LSA CB 2 Pre-correction and CA S Barrel Operand Op A SA 1 Forward Operand Placement Shifters Alignment CB S Format SB 1 Calculation Conversion Op B and Swapping EA 1 ER 1 EB 1 Rounding Mode SR 1 Sign overflow Overflow C 1 IEEE P754 CA 3 UCR Backward Result (Z) ER 2 Post- Shift and K-S Format F 1 CR 1 Post- CB 3 correction Round CR 2 R 1 Network Conversion processing F 2 16

Hardware Implementation • Modeled using RTL Verilog and simulated using Modelsim • Synthesized using LSI Logic’s 0.11um Standard Cell Library and Synopsys Design Compiler • Tested using a comprehensive testbench generator and the decNumber library 3.32 17

Delay and Area Comparison • Combinational circuit designs Metric Thompson’s adder Injection-based adder Improvement Delay (comb.) 3.50 ns, 63.6 FO4 2.76 ns, 50.2 FO4 21.0% Area 22443 NAND eq. gates 22086 NAND eq. gates 1.6% Table 1. Improvement over Thompson’s Design Metric Injection-based adder Multifunction Unit Overhead Delay 2.76 ns, 50.2 FO4 2.84ns, 51.6 FO4 2.8% Area 22086 NAND eq. gates 24233 NAND eq. gates 9.7% Table 2. Overhead of the Multifunction Unit Compared to the Injection-based Adder 18

Cycle Times vs. Pipeline Depth • Synthesized using the pipeline_design command from the Synopsys Design Compiler 6 0 . 0 0 120000 5 0 . 0 0 100000 Area (NAND2 Gate eq.) 80000 4 0 . 0 0 4 60000 O 3 0 . 0 0 F 40000 2 0 . 0 0 20000 1 0 . 0 0 0 1 2 3 4 5 6 0 . 0 0 1 2 3 4 5 6 # of Stages # o f S t a g e s 19

Conclusion • A 16-digit DFP adder and multifunction unit compliant with the IEEE P754 standard • Novel features: – Delay optimization in the operand alignment, rounding, and overflow detection units – A modified injection-based rounding method – Extensions to support multiple DFP operations • Design analysis – 21% delay improvement over Thompson’s design – 2.8% delay overhead for DFP multifunction unit 20

Questions? 21

M M adison E adison E mbedded S mbedded S ystems & A ystems - PowerPoint PPT Presentation

M adison M adison E mbedded E mbedded S ystems & S ystems & A rchitectures Laboratory A rchitectures Laboratory M M adison E adison E mbedded S mbedded S ystems & A ystems & A rchitectures Laboratory rchitectures Laboratory (M

GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F REDERICK S OO , CTO April 4, 2016

D EPENDABLE S OFTWARE FOR E MBEDDED S YSTEMS M ONIKA H EINER BTU Cottbus Computer Science

W ELCOME TO M ADISON J R . H IGH Please visit our website at:

D ISTRIBUTED S YSTEMS [COMP9243] S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS Lecture 7

L ECTURE 2: D YNAMICAL S YSTEMS 1 I NSTRUCTOR : G IANNI A. D I C ARO C OMPLEX S YSTEMS : F

WE WELC LCOME Embed mbedded ed AI with h Domo omo Ruben Visser Head of BI | Virtuagym

+ WMT Wh Whac ac-A-Mole Li Like G Game me CSEE 4840 4840 Embed mbedded

GTU Faculty of Informatics and Control systems TMM Campus De TM e Naye ayer, , GTU

P12 Yerevan Telecommunication Research Institute CJSC YeTRI Tbilisi Technical

Tegra gra Go Goes s Ind ndustry: ustry: Emb mbedded edded Hyp ypersp erspectral ectral

E MBEDDED INTELLIGENCE : TRENDS & CHALLENGES April 16 th , 2019 Embedded & Cyber Physical

U NDERSTANDING E MBEDDED L INUX B ENCHMARKING U SING K ERNEL T RACE A NALYSIS A LEXIS M ARTIN

E MBEDDED LES U SING PANS [2] L ARS D AVIDSON 1 AND S HIA -H UI P ENG 1 , 2 1 Department of Applied

A SSESSING THE C OMMON C ORE , C OMPREHENSIVE A SSESSMENT S YSTEMS C OMPREHENSIVE A SSESSMENT S

[V ECTOR C LOCKS & P2P S YSTEMS ] Shrideep Pallickara Computer Science Colorado State

MIT MIT S EMINAR ON S EMINAR ON MIT ESD.69 EMINAR ON EMINAR ON MIT HST.926 H EALTH EALTH C ARE

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

Regular Expressions Simple matching and searching String: My name is Claus Regex: My name is

Lecture 5.3: Why RSA works Matthew Macauley Department of Mathematical Sciences Clemson

HMDA Webinar 2 Transcript Slides and transcript to accompany the webinar video presentation

23 37

Lesson 10 - I can multiply 3 digits by 1 digit - reasoning and problem solving Starter- recap

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Near Detector Neutrino Flux with Horn/Current Configurations 14 th June 2018 / University of

M M adison E adison E mbedded S mbedded S ystems & A ystems - PowerPoint PPT Presentation

M adison M adison E mbedded E mbedded S ystems & S ystems & A rchitectures Laboratory A rchitectures Laboratory M M adison E adison E mbedded S mbedded S ystems & A ystems & A rchitectures Laboratory rchitectures Laboratory (M

GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F REDERICK S OO , CTO April 4, 2016

D EPENDABLE S OFTWARE FOR E MBEDDED S YSTEMS M ONIKA H EINER BTU Cottbus Computer Science

W ELCOME TO M ADISON J R . H IGH Please visit our website at:

D ISTRIBUTED S YSTEMS [COMP9243] S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS Lecture 7

L ECTURE 2: D YNAMICAL S YSTEMS 1 I NSTRUCTOR : G IANNI A. D I C ARO C OMPLEX S YSTEMS : F

WE WELC LCOME Embed mbedded ed AI with h Domo omo Ruben Visser Head of BI | Virtuagym

+ WMT Wh Whac ac-A-Mole Li Like G Game me CSEE 4840 4840 Embed mbedded

GTU Faculty of Informatics and Control systems TMM Campus De TM e Naye ayer, , GTU

P12 Yerevan Telecommunication Research Institute CJSC YeTRI Tbilisi Technical

Tegra gra Go Goes s Ind ndustry: ustry: Emb mbedded edded Hyp ypersp erspectral ectral

E MBEDDED INTELLIGENCE : TRENDS &amp; CHALLENGES April 16 th , 2019 Embedded &amp; Cyber Physical

U NDERSTANDING E MBEDDED L INUX B ENCHMARKING U SING K ERNEL T RACE A NALYSIS A LEXIS M ARTIN

E MBEDDED LES U SING PANS [2] L ARS D AVIDSON 1 AND S HIA -H UI P ENG 1 , 2 1 Department of Applied

A SSESSING THE C OMMON C ORE , C OMPREHENSIVE A SSESSMENT S YSTEMS C OMPREHENSIVE A SSESSMENT S

[V ECTOR C LOCKS &amp; P2P S YSTEMS ] Shrideep Pallickara Computer Science Colorado State

MIT MIT S EMINAR ON S EMINAR ON MIT ESD.69 EMINAR ON EMINAR ON MIT HST.926 H EALTH EALTH C ARE

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

Regular Expressions Simple matching and searching String: My name is Claus Regex: My name is

Lecture 5.3: Why RSA works Matthew Macauley Department of Mathematical Sciences Clemson

HMDA Webinar 2 Transcript Slides and transcript to accompany the webinar video presentation

23 37

Lesson 10 - I can multiply 3 digits by 1 digit - reasoning and problem solving Starter- recap

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Near Detector Neutrino Flux with Horn/Current Configurations 14 th June 2018 / University of

E MBEDDED INTELLIGENCE : TRENDS & CHALLENGES April 16 th , 2019 Embedded & Cyber Physical

[V ECTOR C LOCKS & P2P S YSTEMS ] Shrideep Pallickara Computer Science Colorado State