At the least, compute one Tap in a 2. Separate AGU from DALU for - PDF document

� � SCOPES 2003 Outline Tailoring Software Pipelining For 1. Low-power DSP 16000 and ZOLB Effective Exploitation Of Zero 2. Compiler Mission Overhead Loop Buffer 3. Conventional Approach 4. Alternatative approach 5. Intermediate Results 6. Conclusion Gang-Ryung Uh CS Department Boise State University Signal Processing Algorithm DSP (Digital Signal Processor) � Programmable processor for mathematical operations to manipulate signals with , f F I R : y = ¢ ² b x o r n = 0 , ..,N k n k - n 1.High performance, , f F F T : y = ¢² w j k x o r j = 0 , .., N - 1 , w h e r e w = e -2 i¥ð /N 2.Minimal power consumption k j 2 3.Minimal memory footprint 2 ¢² D - D C T : F ( u ,v ) = 1 / N ¢ ² n f ( m ,n ) c o s[ ( 2 m + 1 ) u ¥ð / 2 N ] c o s[ ( 2 n + 1 ) u ¥ð / 2 N ] m , f o r m ,n = 0 ,..,N -1 �� I. Heavy arithmetic computations �� II. Can be easily programmed into �� Tight Small Loops Finite Impulse Response (FIR) Finite Impulse Response (FIR) Lucent DSP16000 Architecture Features 1. Havard Architecture At the least, compute one Tap in a 2. Separate AGU from DALU for rich addressing modes Single Cycle 3. Zero-wait State High Speed Memory 1

� Lucent DSP16000 Architecture Lucent DSP16000 Instruction Set Features (cont) Design 4. Compiler (Programmer) Controlled On-Core In order to achieve performance & higher code density Instruction Cache – ZOLB (Zero Overhead Loop Buffer) to support high performace high performace with minimal minimal power dissipation A0 = A0 + P0 P0 = Xh*Yh P1 = Xl * Yl Y = *R0++ X = *PT0++ power dissipation Instruction .... 16 bit word instruction buffer cloop d o c lo o p { in str u c tio n 1 .... Instruction 1 k .... r � Permissible order of operations is very limited Instruction 2 ed o k � The register usage is restricted to only a few different cstate in str u c tio n n .... ... registers ... zolbpc n } ... Instruction 31 Compiler Mission! Experience with Iterative Modulo Where are the compound/complex Scheduling Techniques instructions? EDN Benchmark: FIR Filter A2=0 // EDN Benchmarks j=a4 fir(const short array1[ ], fir(const short array1[ ], const short coeff[], short output[]) const short coeff[], { do 50 { short output[]) int i,j,sum; /* inst 1 */ xh = *(r0 + j) { int i,j,sum; /* inst 2 */ yh = *r3++ for(i=0;i < N-ORDER;i++){ /* inst 3 */ r4 = j sum=0; for(i=0;i < N-ORDER;i++){ a2=0 /* inst 4 */ p0 = xh*yh p1 = xl*yl for(j=0; j < ORDER; j++){ sum=0; j=a4 /* inst 5 */ a2 = a2+p0 for(j=0; j < ORDER; j++){ sum += array1[i+j]*coeff[j]; sum += array1[i+j]*coeff[j]; /* inst 6 */ j = r4+1 } do 50 { } output[i]=sum>>15; } /* inst 1 */ xh = *(r0 + j) output[i]=sum>>15; } } /* inst 2 */ yh = *r3++ } } /* inst 3 */ r4 = j /* inst 4 */ p0 = xh*yh p1 = xl*yl /* inst 5 */ a2 = a2+p0 /* inst 6 */ j = r4+1 A0 = A0 + P0 P0 = Xh*Yh P1 = Xl * Yl Y = *R0++ X = *PT0++ } Step 2: Recurrence Initiation Step 1: Resource Inition Interval Interval MII = MAX(RecII, ResII) MII = MAX(RecII, ResII) ResII : Smallest Loop Initiation Interval RecII : Smallest Integer Loop Initiation Inst 1 Inst 1 to meet the system resource requirement Interval to meet all the deadlines imposed by data dependence circuits. do 50 { Inst 2 /* inst 1 */ xh = *(r0 + j) Inst 2 (1,1) do 50 { /* inst 2 */ yh = *r3++ /* inst 1 */ xh = *(r0 + j) Inst 3 /* inst 3 */ r4 = j /* inst 2 */ yh = *r3++ Inst 3 (1,0) (0,1) /* inst 4 */ p0 = xh*yh p1 = xl*yl (0,1) /* inst 3 */ r4 = j /* inst 5 */ a2 = a2+p0 Inst 4 /* inst 4 */ p0 = xh*yh p1 = xl*yl (1,0) /* inst 6 */ j = r4+1 /* inst 5 */ a2 = a2+p0 Inst 4 } (1,1) /* inst 6 */ j = r4+1 (0,1) Inst 5 } ResII : Resource Initiation (1,0) Inst 5 Interval ? 2 True Dependence Inst 6 (0,1) Inst 6 Output Dependence Anti Dependence 2

Step 2: RecII (cont) Step 2: Compute MinDIST Matrix Floyd Algorithm: Start (0,0) MinDist[i,i] � 0 Adjacency Matrix Inst 1 Adjacency Matrix with II (Initiation Interval) 2 S t a r t I n s t - 1 I n s t - 2 I n s t - 3 I n s t - 4 n I s t -5 I n s t - 6 E n d S t a r t I n s t - 1 I n t s - 2 I n s t - 3 I n s - t 4 I n s t - 5 n I s t - 6 E n d S t a t r I n s - t 1 I n s t - 2 I s n t 3 - n I s - t 4 I n s - t 5 I n s t - 6 E n d Inst 2 X 0 0 0 1 2 1 2 S t a r t X (0 ,0 ) ( 0 , 0 ) (0 ,0 ) ( 0 ,0 ) ( 0 , 0 ) ( 0 ,0 ) 0 ( ,0 ) S t a r t X ( 0 , 0 ) ( 0 0 , ) 0 ( ,0 ) ( 0 ,0 ) 0 ( ,0 ) ( 0 0 , ) ( 0 0 , ) S a t r t X - 1 X 1 2 X 2 Inst 3 I n s t - 1 X X X X ( ,1 0 ) X X ( 0 , 0 ) - 1 n I s - t 1 I n s t - 1 X X X X ( 0 ,1 ) X X ( 0 ,0 ) X - 1 X 1 2 X 2 - 1 n I s t - 2 X X X X ( 0 ,1 ) X X ( 0 0 , ) n I s - t 2 I n s t - 2 X X X X ( 0 ,1 ) X X ( 0 ,0 ) Inst 4 X 0 - 1 1 2 1 2 0 I n s t - 3 X X X X X X ( 0 1 , ) ( 0 0 , ) I n s - t 3 I n s t - 3 X X X X X X ( 0 ,1 ) ( 0 ,0 ) X - 2 - 2 X 1 X 1 I n s t - 4 X ( 1 0 , ) ( ( 1 ,0 ) X X 0 ( ,1 ) X ( 0 0 , ) 1 - I n s t - 4 Inst 5 I n s t - 4 X (1 ,0 ) ( ( 1 ,0 ) X X ( 0 , 1 ) X 0 ( ,0 ) X - 4 - 4 X - 2 X 0 - 1 I n s t - 5 X X X X ( ,0 1 ) X X ( 0 , 0 ) I n s t - 5 X - 1 - 2 - 1 0 1 1 0 I n s t - 5 X X X X ( 1 ,0 ) X X ( 0 ,0 ) n I s t - 6 X ( 1 , 1 ) X ( 1 ,1 ) X X X ( 0 , 0 ) I n s - t 6 Inst 6 X X X X X X X X E n d I n s t - 6 X (1 ,1 ) X (1 ,1 ) X X X 0 ( ,0 ) End Step 3: Slack Scheduling by Why Modulo Scheduling is not computing Estart and Lstart suitable? Floyd Algorithm: Legal Partial Schedule Legal Partial Schedule MinDist[i,i] � 0 based on Estart and based on SLACK with II (Initiation Interval) 2 Lstart Operation Slack ssue T I ime Estart L start I nst-1 0 1 0 S t a r t n I s - t 1 I n s t 2 - I n s - t 3 n I s t - 4 n I s - t 5 n I s - t 6 E n d I nst-2 0 1 1 O peration Slack ssue T I im e X 0 0 0 1 2 1 2 I nst-3 0 1 0 S a t r t I nst-4 1 1 1 E start L start X - 1 X 1 2 X 2 I nst-5 0 1 1 0 I n s t - 1 I nst-6 1 1 1 I nst-1 0 1 0 X - 1 X 1 2 X 2 0 I n s t - 2 I nst-2 0 1 1 // inst-1 && inst-3 No Legal Encoding X 0 - 1 1 2 1 2 0 n I s - t 3 xh=*(r0+j) r4=j I nst-3 0 1 0 X - 2 - 2 X 1 X 1 I n s t - 4 0 I nst-4 1 1 1 X - 4 - 4 X - 2 X 0 n I s t - 5 0 I nst-5 0 1 1 // inst-2 && inst-4 && inst-5 && inst-6 yh=*r3++ p0=xh*yh p1=xl*yl a2=a2+p0 j=r4+1 X - 1 - 2 - 1 0 1 1 I nst-6 1 1 1 0 n I s - t 6 X X X X X X X X E n d Why Modulo Scheduling is not How to Overcome? suitable? � Software pipelining optimization must be Due to limited encoding space Due to limited encoding space , DSP16000 sensitive to Instruction Selection compound instructions that account for {Inst {Inst- -i, i, � This requires that the Instruction selection Inst Inst- -j, Inst j, Inst- -k} k} ,but there is NO legal encoding to performs the following tasks in a demand capture any subset of {Inst {Inst- -i,Inst i,Inst- -j,Inst j,Inst- -k} k} driven manner � proactively perform Register Renaming � proactively introduce additional micro- operations on the fly 3

At the least, compute one Tap in a 2. Separate AGU from DALU for - PDF document

SCOPES 2003 Outline Tailoring Software Pipelining For 1. Low-power DSP 16000 and ZOLB Effective Exploitation Of Zero 2. Compiler Mission Overhead Loop Buffer 3. Conventional Approach 4. Alternatative approach 5. Intermediate

Proposed TAP-R Reconciliation JULY 2020 What is the Tiered Assistance Program (TAP)? TAP is an

Data publication at PADC using TAP ObsTap for CTA, Gaia and EPN-TAP for Europlanet Pierre Le

Tap Room October 27 th , 2016 Presented by: Claudia Musick Agenda: 1. Tap Room Process 2.

TAP the European Leg of the Southern Gas Corridor Michael Hoffmann, TAP External Affairs

TAP Review Democratic Republic of Congo 1 Objective, scope of work and implementation of TAP

AWARDS PRESENTATION 2016 Solo Awards Promising Beginner Tap Solo RILEY ROGERS (Lana Zdunich)

David Rhys Wilton Tap to Donate What you need to know What is Tap to Donate?

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

1 1 easy to compute , 1 easy to compute 2

(TAP) Funding & Application Process Mobile Metropolitan Planning Organization July 19, 2017

Transportation Alternatives Program (TAP) Congestion Mitigation and Air Quality Improvement

TRANSPORTATION ALTERNATIVES PROGRAM (TAP) - FFY2016 & FFY2017 Funding & Application

Barney Allis Plaza TAP Photo here Pan anel el R Rec ecommen endat ations March 28 28-29,

From single-use spring water to tap water How can we tempt people to drink our quality tap water

12 th Street Heritage TAP Photo here Panel Recommendations March 1-2, 2017 Thank you John

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Tap n Ghost A Compilation of Novel Attack Techniques against Smartphone Touchscreens Seita

Medium-Fi Prototyping CS 147 Fall 2019 James R, Victor C, Leenah F, Tingyu Z Our Team Leenah Al

Formal Verification of an Open-Source Secure Enclave Pranav Gaddamadugu pranavsaig@berkeley.edu

In Introduction to swit itchdev SR SR-IOV offloads Or r Ge Gerli litz, Had adar Hen-Zion,

Components of a data platform BUILDIN G DATA EN GIN EERIN G P IP ELIN ES IN P YTH ON Oliver

Introduction In this lecture we will begin by reprising the work done last time for the squeezed-

iOS Gesture Recognizers CocoaConf Boston October 2013 Jonathan Penn @jonathanpenn Slides

COMP 150: Probabilistic Robotics for Human-Robot Interaction Instructor: Jivko Sinapov

At the least, compute one Tap in a 2. Separate AGU from DALU for - PDF document

SCOPES 2003 Outline Tailoring Software Pipelining For 1. Low-power DSP 16000 and ZOLB Effective Exploitation Of Zero 2. Compiler Mission Overhead Loop Buffer 3. Conventional Approach 4. Alternatative approach 5. Intermediate

Proposed TAP-R Reconciliation JULY 2020 What is the Tiered Assistance Program (TAP)? TAP is an

Data publication at PADC using TAP ObsTap for CTA, Gaia and EPN-TAP for Europlanet Pierre Le

Tap Room October 27 th , 2016 Presented by: Claudia Musick Agenda: 1. Tap Room Process 2.

TAP the European Leg of the Southern Gas Corridor Michael Hoffmann, TAP External Affairs

TAP Review Democratic Republic of Congo 1 Objective, scope of work and implementation of TAP

AWARDS PRESENTATION 2016 Solo Awards Promising Beginner Tap Solo RILEY ROGERS (Lana Zdunich)

David Rhys Wilton Tap to Donate What you need to know What is Tap to Donate?

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

1 1 easy to compute , 1 easy to compute 2

(TAP) Funding &amp; Application Process Mobile Metropolitan Planning Organization July 19, 2017

Transportation Alternatives Program (TAP) Congestion Mitigation and Air Quality Improvement

TRANSPORTATION ALTERNATIVES PROGRAM (TAP) - FFY2016 &amp; FFY2017 Funding &amp; Application

Barney Allis Plaza TAP Photo here Pan anel el R Rec ecommen endat ations March 28 28-29,

From single-use spring water to tap water How can we tempt people to drink our quality tap water

12 th Street Heritage TAP Photo here Panel Recommendations March 1-2, 2017 Thank you John

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Tap n Ghost A Compilation of Novel Attack Techniques against Smartphone Touchscreens Seita

Medium-Fi Prototyping CS 147 Fall 2019 James R, Victor C, Leenah F, Tingyu Z Our Team Leenah Al

Formal Verification of an Open-Source Secure Enclave Pranav Gaddamadugu pranavsaig@berkeley.edu

In Introduction to swit itchdev SR SR-IOV offloads Or r Ge Gerli litz, Had adar Hen-Zion,

Components of a data platform BUILDIN G DATA EN GIN EERIN G P IP ELIN ES IN P YTH ON Oliver

Introduction In this lecture we will begin by reprising the work done last time for the squeezed-

iOS Gesture Recognizers CocoaConf Boston October 2013 Jonathan Penn @jonathanpenn Slides

COMP 150: Probabilistic Robotics for Human-Robot Interaction Instructor: Jivko Sinapov

(TAP) Funding & Application Process Mobile Metropolitan Planning Organization July 19, 2017

TRANSPORTATION ALTERNATIVES PROGRAM (TAP) - FFY2016 & FFY2017 Funding & Application