A 1 9 .4 nJ/ Decision 3 6 4 K Decisions/ s I n-Mem ory Random - PowerPoint PPT Presentation

A 1 9 .4 nJ/ Decision 3 6 4 K Decisions/ s I n-Mem ory Random Forest Classifier in 6 T SRAM Array Mingu Kang, Sujan Gonugondla, Naresh Shanbhag University of Illinois at Urbana Champaign

Machine Learning under Resource Constraints  Embedded statistical inference: IoT, sensor-rich platforms  Decision making under resource constraints  Limited form factor, battery-powered, real-time 2

The Random Forest (RF) Algorithm  Random Forest [ 1]  Ensemble of many (a few hundreds) decision trees  High accuracy  Simple computation (only comparisons)  Suitable for multi-class classifications  Inherent error-resiliency (from ensemble nature) RF algorithm [ 1] L. Breiman, Machine Learning2001 3

Implementation Challenges  Implementation challenges  Non-uniform tree structure - Variations in depth, # of nodes, symmetricity  Frequent memory access ( � �,� , � �,� � - Memory dominates the system efficiency  Irregular data access pattern: �� ,� �  Prior Art:  Software and FPGA implementations. No ASIC.  Fails to take advantage of RF algorithm inherent error-resiliency 4

Proposed Solution: Deep In-memory Architecture (DIMA) with DSS  DIMA [ 2-4] :  Embedded analog processing  Storage density, normal read & write function preserved  FR: functional read  BLP: bitline processor (subtraction, comparison)  CBLP: cross BLP (aggregation)  RDL: ADC & residual digital logic  Deterministic sub-sampling (DSS)  Regularizes memory access pattern [ 2] M.Kang, et al., ICASSP14 [ 3] M.Kang, et al., Arxiv16 [ 4] M.Kang, et al., US Patent no. 9,697,877 5

RF Chip Architecture  SRAM bitcell array  Stores up to 42 groups  Each group has 4 sub-group (1 sub group = 1 tree)  Input buffer  Stores 4: 1 sub-sampled pixels in 4 sections for DSS  Cross bar (CB)  31 CB units per sub-group enabled in parallel  Comparator (COMP)  128 analog comparators ( ∆� �� . ∆� �� ) Proposed architecture - IREG : pixel index register, RSREG : RSS register 6

Functional READ (FR) � �� Δ� Δ� �� Δ� �� ∝ � � � �� ∝ � 0.5 � � � Δ� �� Functional read ( FR) Conventional read - B : bit precision, L : column mux ratio  Fetches and computes the linear combination of stored data into analog  ( LB ) times more data access per read & precharge  Savings in energy & delay at the cost of reduced SNR 7

In-memory Bitline Processing  Subtraction � � 1 → � � � � @ 2 � � �� and � in the same column Store �  � ∝ � � � , ∆� � � � ∝ � � � ∆� �� Comparison: ∆� �� ∆�  > �� < 1 0.7 : variation due to possible cominations of 0.9 ( T MSB , X MSB ) at the T MSB ‐ X MSB value 0.695 0.8 0.7 0.69 V BL (V) 0.6 0.685 0.5 0.68 0.4 X MSB T MSB 0.3 15 0 0 15 0.675 T MSB = 0 X MSB = 0 0.2 0.67 0.1 0 ‐15 ‐10 ‐5 0 5 10 15 T MSB ‐ X MSB A colum n of SRAM array Measured subtraction in a 6 5 nm CMOS 8

Deterministic Sub-sampling (DSS)  Random sub-sampling (RSS)  Requires complex cross bar (e.g., 256: 1 for 256-pixel � )  Deterministic sub-sampling (DSS) before RSS Sub-samples � to generate  four sub-images � �,�,�,�  Reduces cross bar complexity (e.g., 256: 1 → 64: 1)  More than 3× and 4× energy and layout area savings  4: 1 chosen due to accuracy vs. sub-sampling ratio trade-off Proposed RF algorithm 9

Application & Measured Results  Training (off-chip)  200 images per class employed for training  Bit precision: 8, tree depth: 6, 64 trees  Testing  Randomly chosen 200 testing images from test data set KUL Belgium traffic sign dataset Energy Energy Platform Max Classification # of per delay ( 6 5 nm tree rate Accuracy ( % ) trees decision product CMOS) Depth ( decisions/ m s) ( nJ) ( fJ·s) Conv. Arch. 6 4 6 1 6 7 / bank 6 0 .4 3 6 1 .6 9 3 .5 Proposed 6 4 6 3 6 4 / bank 1 9 .4 5 3 .2 9 4 Arch. EDP reduction by 6 .8 × 10

Measured Energy vs. Accuracy Trade-off Accuracy vs. # of trees vs. Δ � �� Accuracy  BL swing  Energy  # of trees  error resiliency  → allows lower BL swing Accuracy vs. energy → higher energy efficiency w .r.t BL sw ing ( Δ � �� ) * * Δ� �� for conv. is 10 × ” Δ� �� per LSB” 11

Chip Summary & Comparison Chip m icrograph Chip sum m ary Technology 65 nm CMOS 1.2 × 1.2 mm Die size 16 KB SRAM capacity (512 × 256 bit-cells) 2.11 × 0.92 um 2 Bit-cell size CTRL CLK freq. 1 GHz CORE 1.0 Supply voltage (V) CTRL 0.75 Com parison w ith state-of-the-art Prior Input Throughput Energy EDP Process Algorithm Dataset Accuracy art size (8b) (decision/s) (nJ/decision) (fJs/decision) Support 130nm Traffic 320 33 1.5M 45G [5] vector 90% CMOS sign video × 240 [40K]* [1250]* [31250]* machine 14nm K-nearest Not 21.5M 3.4 0.2 Not [6] 128 tri-gate neighbor reported [498.8K]* [145.3]* [292.3]* reported 65nm Ours Random KUL traffic 16 19.4 364.4K 52.4 94% CMOS ( M =64) forest signs × 16 (w/ CTRL) [ 5] : J.Park JSSC12, [ 6] : H.Kaul ISSCC16, * scaled to 65 nm CMOS 12

Conclusions  First ASIC implementation of RF algorithm  low-SNR processing via DIMA and DSS  Energy & speed benefits  2.2 × and 3.1 × smaller delay and energy → 6.8 × smaller EDP compared to digital ASIC  Higher potential in large-scale applications  # of trees up to a few hundreds in real-life applications → Higher error-resiliency → More room to scale ∆� �� for energy efficiency  Future work  On-chip training to compensate process variations  Different algorithms (e.g., boosted ensemble classifier) 13

Acknowledgment  This work was supported by Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by SRC and DARPA. 14

A 1 9 .4 nJ/ Decision 3 6 4 K Decisions/ s I n-Mem ory Random - PowerPoint PPT Presentation

A 1 9 .4 nJ/ Decision 3 6 4 K Decisions/ s I n-Mem ory Random Forest Classifier in 6 T SRAM Array Mingu Kang, Sujan Gonugondla, Naresh Shanbhag University of Illinois at Urbana Champaign Machine Learning under Resource Constraints

A Bri A Brief ef Hi Hist story ory A Br A Brief ief Hi Hist story ory A Bri A Brief

A Survey of Oblivious RAMs David Cash IBM Securely Outsourcing Memory Server Goal : Store,

Pipeline Control unit (highly abstracted) Control ID/EX EX/Mem Unit Mem/WB IF/ID IF ID EX

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

mem o ry noun \memr, mem\ 1 a: the power or process of reproducing or

Scalin ling s servic ices w it it h Dis ist rib ibut ed In-Mem em ory Ca Caches SATURN

ER ERMIA: IA: Fast t Mem emor ory-Op Optimi mized Da Database se System em for or He

CS 423 423 Ope Operati ating Sy g Syste tem D m Design gn: Mem Memory ory Wra Wrap-Up

Anci An cient Wor orld Hi Histor ory: Overview of of Bi Biblical Hi Histor ory fro rom

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Making Decisions 10 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 10 1 10 Making Decisions

GCSE or Equivalent Options Decisions! Decisions! Decisions! An important time for our Year 10

Doing Your Taxes Decisions Decisions Decisions How do I get ready? Should I

Dysphagia: decisions, decisions, decisions Sean White Home Enteral Feed Dietitian Sheffield

Stock Returns: Discussion PRESENTER Phil Davies Jacobs Levy Equity Management Anomaly

Transaction Checking: May 4, 2015 Quality Control or a Safety Net? Dodd Starbird Learning

Development of a Smarter Balanced Assessment to Assess Grade-Level Readiness and Summer Learning

Random Graphs Liang Li April 9, 2014 Outline Objectives Internet Topology Melting points [2]

Mechanics of randomly irregular metamaterials Professor Sondipon Adhikari Zienkiewicz Centre for

PRESENTATION GAME THEORY Goals n In this course we will introduce some basic ideas about Game

Hosting Capacity & Interactive Maps MADRI Working Group Meeting #49 Steve Steffel March 13,

2015 Resident Survey Durham County, North Carolina Presented by February 2016 ETC Institute A

A 1 9 .4 nJ/ Decision 3 6 4 K Decisions/ s I n-Mem ory Random - PowerPoint PPT Presentation

A 1 9 .4 nJ/ Decision 3 6 4 K Decisions/ s I n-Mem ory Random Forest Classifier in 6 T SRAM Array Mingu Kang, Sujan Gonugondla, Naresh Shanbhag University of Illinois at Urbana Champaign Machine Learning under Resource Constraints

A Bri A Brief ef Hi Hist story ory A Br A Brief ief Hi Hist story ory A Bri A Brief

A Survey of Oblivious RAMs David Cash IBM Securely Outsourcing Memory Server Goal : Store,

Pipeline Control unit (highly abstracted) Control ID/EX EX/Mem Unit Mem/WB IF/ID IF ID EX

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

mem o ry noun \memr, mem\ 1 a: the power or process of reproducing or

Scalin ling s servic ices w it it h Dis ist rib ibut ed In-Mem em ory Ca Caches SATURN

ER ERMIA: IA: Fast t Mem emor ory-Op Optimi mized Da Database se System em for or He

CS 423 423 Ope Operati ating Sy g Syste tem D m Design gn: Mem Memory ory Wra Wrap-Up

Anci An cient Wor orld Hi Histor ory: Overview of of Bi Biblical Hi Histor ory fro rom

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Making Decisions 10 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 10 1 10 Making Decisions

GCSE or Equivalent Options Decisions! Decisions! Decisions! An important time for our Year 10

Doing Your Taxes Decisions Decisions Decisions How do I get ready? Should I

Dysphagia: decisions, decisions, decisions Sean White Home Enteral Feed Dietitian Sheffield

Stock Returns: Discussion PRESENTER Phil Davies Jacobs Levy Equity Management Anomaly

Transaction Checking: May 4, 2015 Quality Control or a Safety Net? Dodd Starbird Learning

Development of a Smarter Balanced Assessment to Assess Grade-Level Readiness and Summer Learning

Random Graphs Liang Li April 9, 2014 Outline Objectives Internet Topology Melting points [2]

Mechanics of randomly irregular metamaterials Professor Sondipon Adhikari Zienkiewicz Centre for

PRESENTATION GAME THEORY Goals n In this course we will introduce some basic ideas about Game

Hosting Capacity &amp; Interactive Maps MADRI Working Group Meeting #49 Steve Steffel March 13,

2015 Resident Survey Durham County, North Carolina Presented by February 2016 ETC Institute A

Hosting Capacity & Interactive Maps MADRI Working Group Meeting #49 Steve Steffel March 13,