In In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren - PowerPoint PPT Presentation

EVA 2 : Exploiting Temporal Redundancy In In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018

Convolutional Neural Networks (CNNs) 2

Convolutional Neural Networks (CNNs) 3

FPGA Research ASIC Research Embedded Vision Accelerators Zhang et al. Suda et al. ShiDianNao Eyeriss Qiu et al. Farabet et al. EIE SCNN Many more… Many more… Industry Adoption 4

Temporal Redundancy Frame 3 Frame 0 Frame 1 Frame 2 Input Change Low High Low Low 5

Temporal Redundancy Frame 3 Frame 0 Frame 1 Frame 2 Input Change Low High Low Low Cost to High High High High Process 6

Temporal Redundancy Frame 3 Frame 0 Frame 1 Frame 2 Input Change Low High Low Low Cost to High High High High Process Low Low Low 7

Talk Overview Background Algorithm Hardware Evaluation Conclusion 8

Common Structure in CNNs Image Classification Object Detection Semantic Segmentation Image Captioning 10

Common Structure in CNNs Intermediate Activations CNN CNN Frame 0 Prefix Suffix High energy Low energy CNN CNN Frame 1 Prefix Suffix High energy Low energy #MakeRyanGoslingTheNewLenna 11

Common Structure in CNNs Intermediate Activations CNN CNN “Key Frame” Prefix Suffix High energy Low energy ≈ Motion Motion CNN CNN “Predicted Frame” Prefix Suffix High energy Low energy #MakeRyanGoslingTheNewLenna 12

Common Structure in CNNs Intermediate Activations CNN CNN “Key Frame” Prefix Suffix High energy Low energy ≈ Motion Motion CNN CNN “Predicted Frame” Prefix Suffix Low energy #MakeRyanGoslingTheNewLenna 13

Activation Motion Compensation (AMC) Time Input Frame Vision Computation Vision Result Stored Activations Key CNN CNN t Frame Prefix Suffix Predicted Motion Motion CNN t+k Frame Estimation Compensation Suffix Motion Predicted Vector Field Activations 15

Activation Motion Compensation (AMC) Time Input Frame Vision Computation Vision Result Stored Activations Key CNN CNN t Frame Prefix Suffix ~10 11 MACs Predicted Motion Motion CNN t+k Frame Estimation Compensation Suffix ~10 7 Adds Motion Predicted Vector Field Activations 16

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? 17

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? ? 20

Motion Estimation • We need to estimate the motion of activations by using pixels … CNN CNN Prefix Suffix Motion Motion CNN Estimation Compensation Suffix Performed on Performed on Pixels Activations 22

Pixels to Activations 3x3 3x3 Conv Conv Input Intermediate Intermediate 64 64 Image Activations Activations 23

Pixels to Activations: Receptive Fields C=64 C=64 C=3 w=h=8 3x3 3x3 Conv Conv Input Intermediate Intermediate 64 64 Image Activations Activations 24

Pixels to Activations: Receptive Fields C=64 C=64 C=3 w=h=8 5x5 “Receptive Field” 3x3 3x3 Conv Conv Input Intermediate Intermediate 64 64 Image Activations Activations • Estimate motion of activations by estimating motion of receptive fields 25

Receptive Field Block Motion Estimation (RFBME) … … Key Frame Predicted Frame 26

Receptive Field Block Motion Estimation (RFBME) 0 1 2 3 0 1 2 3 Key Frame Predicted Frame 27

Receptive Field Block Motion Estimation (RFBME) 0 1 2 3 0 1 2 3 Key Frame Predicted Frame 28

Motion Compensation C=64 C=64 Vector: X = 2.5 Y = 2.5 Stored Activations Predicted Activations • Subtract the vector to index into the stored activations • Interpolate when necessary 30

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? ? 31

When to Compute Key Frame? • System needs a new key frame when motion estimation fails: • De-occlusion • New objects • Rotation/scaling • Lighting changes 32

When to Compute Key Frame? Input Frame Key Frame • System needs a new key frame when motion estimation fails: Motion Estimation • De-occlusion • New objects Yes No Error > • Rotation/scaling Thresh? • Lighting changes CNN Motion Prefix Compensation • So, compute key frame when RFBME error exceeds set threshold CNN Suffix Vision Result 33

Embedded Vision Accelerator Global Buffer Eyeriss EIE (Conv) (Full Connect) CNN CNN Prefix Suffix Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, 35 “ Eyeriss: An energy- efficient reconfigurable accelerator for deep convolutional neural networks,” “EIE: Efficient inference engine on compressed deep neural network,”

Embedded Vision Accelerator Accelerator (EVA 2 ) Global Buffer Eyeriss EIE EVA 2 (Conv) (Full Connect) Motion Motion CNN CNN Estimation Compensation Prefix Suffix Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, 36 “ Eyeriss: An energy- efficient reconfigurable accelerator for deep convolutional neural networks,” “EIE: Efficient inference engine on compressed deep neural network,”

Embedded Vision Accelerator Accelerator (EVA 2 ) Frame 0 37

Embedded Vision Accelerator Accelerator (EVA 2 ) Frame 0: Key frame 38

Embedded Vision Accelerator Accelerator (EVA 2 ) Frame 1 Motion Estimation 39

Embedded Vision Accelerator Accelerator (EVA 2 ) Frame 1: Predicted frame Motion Motion Estimation Compensation • EVA 2 leverages sparse techniques to save 80-87% storage and computation 40

Evaluation Details Train/Validation Datasets YouTube Bounding Box: Object Detection & Classification Evaluated Networks AlexNet, Faster R-CNN with VGGM and VGG16 Hardware Baseline Eyeriss & EIE performance scaled from papers EVA 2 Implementation Written in RTL, synthesized with 65nm TSMC 42

EVA 2 Area Overhead EVA 2 takes up Total 65nm area: 74mm 2 only 3.3% 43

EVA 2 Energy Savings 1 Input Frame 0.9 Normalized Energy 0.8 0.7 CNN 0.6 Prefix 0.5 0.4 0.3 CNN 0.2 Suffix 0.1 0 orig orig orig Vision Result AlexNet Faster16 FasterM Eyeriss EIE EVA^2 44

EVA 2 Energy Savings Input Frame Key Frame 1 Motion 0.9 Normalized Energy Estimation 0.8 0.7 0.6 0.5 Motion 0.4 Compensation 0.3 0.2 0.1 CNN 0 Suffix orig pred orig pred orig pred AlexNet Faster16 FasterM Vision Result Eyeriss EIE EVA^2 45

EVA 2 Energy Savings Input Frame Key Frame 1 Motion 0.9 Estimation Normalized Energy 0.8 0.7 Yes No Error > 0.6 Thresh? 0.5 0.4 0.3 CNN Motion Prefix Compensation 0.2 0.1 0 orig pred avg orig pred avg orig pred avg CNN Suffix AlexNet Faster16 FasterM Eyeriss EIE EVA^2 Vision Result 46

High Level EVA 2 Results Network Vision Task Keyframe % Accuracy Average Latency Average Energy Degredation Savings Savings AlexNet Classification 11% 0.8% top-1 86.9% 87.5% Faster R-CNN VGG16 Detection 36% 0.7% mAP 61.7% 61.9% Faster R-CNN VGGM Detection 37% 0.6% mAP 54.1% 54.7% • EVA 2 enables 54-87% savings while incurring <1% accuracy degradation • Adaptive key frame choice metric can be adjusted 47

Conclusion • Temporal redundancy is an entirely new dimension for optimization • AMC & EVA 2 improve efficiency and are highly general • Applicable to many different… • CNN applications (classification, detection, segmentation, etc) • Hardware architectures (CPU, GPU, ASIC, etc) • Motion estimation/compensation algorithms 49

EVA 2 : Exploiting Temporal Redundancy In In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018

Backup Slides 51

In In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren - PowerPoint PPT Presentation

EVA 2 : Exploiting Temporal Redundancy In In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018 Convolutional Neural Networks

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Live Objects Live Objects Live Objects Live Objects Krzys Ostrowski, Ken Birman, Danny Dolev

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

PRIVATE EVENTS PrivateEvents@ACL-LIVE.com (512)404-1318 ACL LIVE: A Black Box for events

Love Case Packing Live Auction Slides Proper Packaging and Handling Procedure How to create live

Intro to Live S treaming Andy Beach Techgeist, Inc @ andybeach Types of Live S treams

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

WELCOME! Agenda Welcome & Introductions Committee Overview State of the District

The Role of States and Regions in Health Reform: Going Forward Panel MAY 26, 2016 THE 23 RD

Calvert vaux vISION Central Park, NYC Downing Park, Newburgh the WallkIll valley Cemetery

Girl Scouts of West Central Florida Area Association Meeting Agenda Oct. 15-18, 2019 Welcome

Performance Computing Lab INAOE Puebla, Mexico Embedded vision with FPGA vs CUDA processing.

via a Hybrid Neural Network Sifei Liu 1 Jinshan Pan 12 Ming-Hsuan Yang 1 1 University of California

Vision Network Session 1 February 7, 2019 Dinner & Get to Know Those at Your Table 1

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models UC

In In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren - PowerPoint PPT Presentation

EVA 2 : Exploiting Temporal Redundancy In In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018 Convolutional Neural Networks

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Live Objects Live Objects Live Objects Live Objects Krzys Ostrowski, Ken Birman, Danny Dolev

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

PRIVATE EVENTS PrivateEvents@ACL-LIVE.com (512)404-1318 ACL LIVE: A Black Box for events

Love Case Packing Live Auction Slides Proper Packaging and Handling Procedure How to create live

Intro to Live S treaming Andy Beach Techgeist, Inc @ andybeach Types of Live S treams

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

WELCOME! Agenda Welcome &amp; Introductions Committee Overview State of the District

The Role of States and Regions in Health Reform: Going Forward Panel MAY 26, 2016 THE 23 RD

Calvert vaux vISION Central Park, NYC Downing Park, Newburgh the WallkIll valley Cemetery

Girl Scouts of West Central Florida Area Association Meeting Agenda Oct. 15-18, 2019 Welcome

Performance Computing Lab INAOE Puebla, Mexico Embedded vision with FPGA vs CUDA processing.

via a Hybrid Neural Network Sifei Liu 1 Jinshan Pan 12 Ming-Hsuan Yang 1 1 University of California

Vision Network Session 1 February 7, 2019 Dinner &amp; Get to Know Those at Your Table 1

Tow ards Bridging Bottom -Up &amp; Top-Dow n Vision w ith Hierarchical Com positional Models UC

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

WELCOME! Agenda Welcome & Introductions Committee Overview State of the District

Vision Network Session 1 February 7, 2019 Dinner & Get to Know Those at Your Table 1

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models UC