Professor Media IC & System Lab Graduate Institute of - PowerPoint PPT Presentation

Shao-Yi Chien (簡韶逸) Professor Media IC & System Lab Graduate Institute of Electronics Engineering National Taiwan University

Outline AI edge: distributed intelligence Tensor transform for memory-efficient operations Implementation results Conclusion

Internet-of-AI-Things AI Big IoT Data

Where Should Computing be Located? Data from Internet: big data Cloud Servers Data from IoT: Ultra-big data ! AI on the cloud? Aggregator AI on the edge? Aggregator Smart Devices

Distributed Intelligence AI Edge Senso sor Aggregator/ Ag Cloud Cl Ga Gate teway Data from Large La Small Sm Each Sensor Data Filtering Process Hi High Low Low Semantic Level Context Inferring Process Light-We Li Weight Learning/Reco cognition Cloud Serve vers rs with HSA, NPU, DSP, P, En Engine CPU/GPU PU/FPG PGA Neura ral Proce cesso ssors rs

�� Deep Learning Ecosystem Memory efficient is the most important target for optimization

�� Unroll: Fast and Simple 7

�� Formulation of Unrolling 8

�� Unroll: More than Conv. 9

Unrolling: Where and Who? Where the unrolling operation is employed? Everywhere in optimized parallel computing systems! CPU, GPU, DSP, VPU, ASIC Who will execute unrolling in a system General purpose processors: the software developers need to handle it VPU and ASIC: it is embedded in the hardware for specific applications

�� Problem of Unrolling Main memory Main memory 11

Unroll is a Fast Blackbox Unroll Blackbox Main memory Processors 12

Efficient Blackbox: Unroll as Last as Possible 13

�� Naïve Unrolling 14

�� Unroll at Shared Memory 15

� �� Unroll Upon Computation 16

�� Useful Unrolling Framework Requires Formulation of unrolling Build algorithms by unrolling DNN CV, ML … Memory efficient unrolling GPUs ASICs 17

�� UMI (Unrolled Memory Inner-Products) Operator You simply write code for Describing the unroll pattern and Defining what to do for each row. Efficient blackbox make you code fast. 18

� � Memory Efficient Unrolling Smooth dataflow must consider: DRAM reuse 1. Bank conflict 2. Both can be analyzed by the formula: 19

UMI: Experimental Results UMI blackbox Baseline: OpenCV, Parboil and Caffe CUDA version is available on Github Code reduction 2--4x Speed-up 1.4--26x Hardware implementation is coming soon Ref: Y. S. Lin, W. C. Chen and S. Y. Chien, "Unrolled Memory Inner-Products: An Abstract GPU Operator for Efficient Vision-Related Computations," ICCV 2017 .

�� ASIC Design TAU: 32-core parallel processor Scaled up linearly 21

Conclusion AI edge: distributed intelligence Memory access optimization is the key for efficient CNN computing Unrolling plays an important role for memory optimization, which can also benefit other operations A unrolling framework, tensor transform for memory- efficient operations, is developed to decouple unrolling operations Implementation results: code reduction 2--4x; speed- up 1.4--26x

�� Using UMI Operator is… 23

Professor Media IC & System Lab Graduate Institute of - PowerPoint PPT Presentation

Shao-Yi Chien () Professor Media IC & System Lab Graduate Institute of Electronics Engineering National Taiwan University Outline AI edge: distributed intelligence Tensor transform for memory-efficient operations

Post Graduate Fellowships Types of Fellowships Graduate Study Post Graduate Travel Post

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

The NSF Graduate The NSF Graduate The NSF Graduate The NSF Graduate Research Fellowship

New Media Production 2 MUMT 303 Week 1 Sven-Amin Lembke What is new media? What is OLD media?

Media 101 Presented by: Elements of a Media Campaign: Overview Positioning Media strategy

Advancing graduate education. Enhancing the graduate student experience. How to Write a

NEW GRADUATE STUDENT ORIENTATION 2014 The Henry Samueli School of Engineering UC Irvine Graduate

Audio Equalizer Audio Equalizer Instructor: Prof. Andy Wu ACCESS IC LAB ACCESS IC LAB Graduate

/ 9/16/2004 ACCESS IC LAB ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

HCC@UF Lab Resources Overview (and Tour) Lisa Anthony, PhD January 12, 2017 HCC@UF Lab

Lab 7 Lab 6 Review Review for Lab 7 March 5, 2019 Sprenkle - CSCI111 1 Lab 7: Pair

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Law and the Media Media lies, war propaganda and manipulation JRN 6205 Media Ethics and Law By

What is your definition of media literacy? 1. Radical media education 2. Ideology in media 3.

MEDIA TRAINING Media Outreach and Social Media INTRODUCTIONS Media Outreach Best Practices

Provably Secure Camouflaging Strategy for IC Protection Meng Li 1 Kaveh Shamsi 2 Travis Meade 2

RISK ASSESSEMENT supporting TEST supporting supporting supporting supporting REAGENTS RISK

In Industry ry Contribution Dis iscussions Abhimanyu Gosain Technical Program Director

The Development of IC Public Service Platform Zhou Meng 2016.8 What are we doing? National

A join point for loops in AspectJ Bruno Harbulot and John Gurd Bruno Harbulot FOAL 2005

Spies on Inclusion Policies, Programs, and Pamphlets The United States Intelligence Community

An Investigation into System-Level Trimming Issues in On-Chip Nanophotonic Networks p

HW #8 WordNet-based WSD Perform word sense disambiguation of probe word In context of