High-Performance Embedded Systems-on-a-Chip Lecture 17: Scheduling Sanjay Rajopadhye Computer Science, Colorado State University High-Performance Embedded Systems-on-a-Chip – p.1/18
� � � � Limitations of Systolic Arrays Only a (very small) proper subset of SAREs: Those that are Serializable, are Localizable, correspnod to a Single Equation, and admit a One-dimensional schedule. Question: What is beyond systolic arrays? High-Performance Embedded Systems-on-a-Chip – p.2/18
� ✁ � � � � ✁ � � � (Silicon) Compilation For each point in domain of each variable, determine: A time instant schedule processor A place allocation and memory Transform P-SARE so that indices denote either time, processor, or memory address Generate code (or HDL, we hope) High-Performance Embedded Systems-on-a-Chip – p.3/18
� � � � � � Two Orthogonal Issues Static Analysis: what transformation to apply scheduling processor (& memory) allocation Program Transformation: manipulating the SARE Rules to modify the SARE (Change of Basis) Code Generation (how to interpret the transformed SARE) High-Performance Embedded Systems-on-a-Chip – p.4/18
� � � Golden Rule of Static Analysis The dependence graph cannot be explicitly constructed Too large Not (fully) known at compile time – parameters Explicitly constructed results are not useful Implication: use compact information High-Performance Embedded Systems-on-a-Chip – p.5/18
✑ ✄ ✌ ✗ ✡ ✂ ✟✠ � ☎ ✂ ✕ ☎ ✁ � � ✡ ✁ � � ✕✖ Compact information Reduced Dependence (Multi) Graph (RDG) Nodes variables in the SARE Edges for each occurrence of on the rhs ✆✞✝ of the equation for edge from to . ✡☞☛ ✍✏✎ ✒✔✓ Labeled with the dependence function, the (sub) domain of where it occurs Miscellaneous info (eg. duration, etc.) High-Performance Embedded Systems-on-a-Chip – p.6/18
✤ ✣ ✘ ☛ ✙ ✟ ✢ ✣ � ✙ ✤ ✤ ★ ✤ ✒ ✩ ✙ ✩ ✦ ✪ ✪ ✥ ✣ ✤ ✬ ✙ ✆ ★ ✥ ✒ ✤ ✒ ✣ ✘ ✤ ✠ ✦ ✚ ✥ ✠ ✝ ✆ � ✘ ☛ ✙ ✤ ✒ ✆ ✚ ☛ ✛ ✟ ✣ ✬ Key Problem: Scheduling Definition: A function such that whenever ✄✞✙ depends on , then . ✄✞✛ ✟✏✜ Affine schedules: ✥✧✦ ✥✫✪ Geometric interpretation: all points executed at time belong to isotemporal hyperplane with ✒✮✭ normal vector High-Performance Embedded Systems-on-a-Chip – p.7/18
✪ ✵ ✠ ✸ ✥ ✝ ✄ ✚ ☛ ✪ ✪ ✚ ☛ ✠ ✩ ✥ � ✝ ✄ ✚ ✆ ☎ � ✱ ✰ ✝ ✯ ★ ✠ ✝ ✄ � Scheduling a (single) URE ✲✴✳ ✵✷✶ Its RDG is just one node, with self loops, each labeled with a vector . ✵✺✹ High-Performance Embedded Systems-on-a-Chip – p.8/18
✥ ✥ ✦ ✜ ✣ � � ❄ ✟ ✦ ✝ ★ ✣ ✝ ✥ ✣ ❃ ❁ ✥ ✣ ✵ ★ ✻ ✣ ☛ ✦ ✼ � ✽ ✾ ❀ ✪ ✪ ✪ ✸ ✢ ✿ ✝ ✰ ✣ Scheduling a single URE is valid iff for , and ✆✞✝ ✵❂❁ ✵✺❁ i.e., Finite number of constraints, independent of domain size. Scheduling Linear Programming Geometric view: Choose the hyperplanes so that dependences point backwards High-Performance Embedded Systems-on-a-Chip – p.9/18
High-Performance Embedded Systems-on-a-Chip – p.10/18 ① ③ ③ ③ ③ ③ ② ② ② ② ② ② ② ① ④ ① ① ① ✇ ✇ ✇ ✇ ✇ ✈ ✈ ✈ ✈ ✈ ④ ④ ✉ ⑦ ⑨ ⑨ ⑨ ⑨ ⑨ ⑧ ⑧ ⑧ ⑧ ⑧ ⑧ ⑧ ⑦ ⑦ ④ ⑦ ⑦ ⑥ ⑥ ⑥ ⑥ ⑥ ⑤ ⑤ ⑤ ⑤ ⑤ ④ ✈ ✉ ⑩ ♥ q q ♣ ♣ ♣ ♣ ♣ ♦ ♦ ♦ ♦ ♦ ♥ q ♥ ♥ ♥ ♠ ♠ ❅ ♠ ♠ ❧ ❧ ❧ ❧ ❧ q q ✉ t ✉ ✉ ✉ ✉ ✉ ✉ ✉ ✉ ✉ t t t t t r s s s s s s s s s r r r r ⑩ ⑩ ❦ ➉ ➉ ➇ ❾ ➃ ➏ ➀ ➍ ❾ ➑ ➓ ➒ ➃ ➇ ➑ ➀ ➑ ❾ ➃ ➏ ➀ ➍ ❾ ➐ ➎ ➂ ➏ ➎ ➀ ➃ ➍ ➜ ➂ ➎ ❿ ➄ ➋ ➂ ➆ ➊ ➝ ➜ ➔ ➐ ➀ ➛ ➒ ↕ ➀ ➣ ➙ ➐ ➀ ↕ ➀ ➑ ➔ ➐ ➑ ➓ ❿ ➌ ⑩ ❸ ❻ ❺ ❺ ❺ ❺ ❺ ❹ ❹ ❹ ❹ ❹ ❸ ❸ ❻ ❸ ❸ ❷ ❷ ❷ ❷ ❷ ❶ ❶ ❶ ❶ ❶ ⑩ ❻ ❻ ➋ ➉ ➂ ➆ ➊ ➃ ➉ ➇ ➂ ❾ ❽ ➀ ➃ ➂ ➀ ❾ ❻ ❽ ➆ ➅ ➄ ➃ ➂ ❾ ❽ ❼ ❼ ❼ ❼ ❼ ❦ ♠ ❦ P ❘ ❘ ❘ ◗ ◗ ◗ ◗ ◗ P P P P ❖ ❘ ❖ ❖ ❖ ❖ ◆ ◆ ◆ ◆ ◆ ▼ ▼ ▼ ▼ ❘ ❙ ▲ ❱ ❳ ❳ ❳ ❳ ❲ ❲ ❲ ❦ ❲ ❱ ❱ ❱ ❱ ❙ ❯ ❯ ❯ ❯ ❯ ❚ ❚ ❚ ❚ ❚ ❙ ❙ ❙ ▼ ▲ ❨ ❉ ❋ ❋ ❊ ❊ ❊ ❊ ❊ ❊ ❉ ❉ ❉ ❉ ❈ ❋ ❈ ❈ ❈ ❈ ❆ ❆ ❆ ❆ ❆ ❅ ❅ ❅ ❅ ❋ ❋ ▲ ■ ▲ ▲ ❑ ❑ ❑ ❑ ❑ ❏ ❏ ❏ ❏ ❏ ■ ❋ ■ ■ ■ ❍ ❍ ❍ ❍ ❍ ● ● ● ● ● ❳ ❲ ❨ ❜ ❞ ❞ ❞ ❞ ❝ ❝ ❝ ❝ ❝ ❜ ❜ ❜ ❜ ❜ ❡ ❛ ❛ ❛ ❛ ❛ ❛ ❛ ❨ ❛ ❵ ❵ ❵ ❵ ❞ ❡ ❵ ❤ ❦ ❥ ❥ ❥ ❥ ❥ ✐ ✐ ✐ ✐ ✐ ❤ ❤ ❤ ❡ ❤ ❣ ❣ ❣ ❣ ❣ ❢ ❢ ❢ ❢ ❢ ❡ ❡ ❵ ❛ ❴ ❬ ❪ ❪ ❪ ❪ ❭ ❭ ❭ ❭ ❭ ❬ ❬ ❬ ❪ ❬ ❬ ❬ ❬ ❬ ❩ ❩ ❩ ❩ ❩ ❨ ❨ ❪ ❭ ❪ ❫ ❫ ❫ ❴ ❴ ❫ ❴ ❴ ❫ ❴ ❴ ❪ ❴ ❪ ❴ ❫ Example Schedule validity conditions ❿➁➀ ❿➁➀ Optimal schedule: ❿➈➇ →↔➣ i.e., ❿➁➀ ❿➁➀ ❷❇❸ ④❇⑤ ⑩❇❶ ②❇③ ⑧❇⑨ ❹❇❺ ✇❇① ⑥❇⑦ ❖❇P ❝❇❞ ✐❇❥ ❻❇❼ ❨❇❩ ❯❇❱ ♦❇♣ ■❇❏ ❣❇❤ ❅❇❆ ●❇❍ ❡❇❢ ▼❇◆ ❙❇❚ ♠❇♥ ◗❇❘ ❈❇❉ ❑❇▲ ❲❇❳ q❇r ❦❇❧
✘ ✦ � � ✣ � � ✤ Scheduling an SURE Single schedule for all variables Not general enough: some well defined SURE’s don’t admit such a schedule (e.g. the convolution example) Shifted linear schedules Allow the to be different for each variable, , but same also not general enough Variable dependent schedules: different slopes for different variables) not general enogh either Multidimensional schedules most general, but still not enough High-Performance Embedded Systems-on-a-Chip – p.11/18
✠ ✥ ➞ ✡ ★ ✑ ✆ ✂ ✄ ✍ ✾ ✍ ☛ ➞ ➟ ✾ ✠ ✟ � � ☛ ✄ ✡ ✡ ✄ ✍ ☛ ➞ ✠ ★ ✎ ✆ ✄ ✂ ✍ ➟ ✾ ☛ ➞ ✥ ✾ ✠ ✟ ✂ Limits of shifted linear schedules This SURE cannot be scheduled with same-slope lines for both and . But do a simple CoB—transpose one of the vars—and now it can. High-Performance Embedded Systems-on-a-Chip – p.12/18
High-Performance Embedded Systems-on-a-Chip – p.13/18 Ð Ñ Ñ Ñ Ñ Ñ Ð Ð Ð Ð Ò Ï Ï Ï Ï Ï ❰ ❰ ❰ Ò Ò ❰ Ô ❾ ❽ Õ Õ Õ Õ Õ Ô Ô Ò Ô Ô Ó Ó Ó Ó Ó Ò ❰ ❮ ➃ ➬ ➱ ➮ ➮ ➮ ➮ ➮ ➬ ➬ ➬ ➱ ➬ ➠ ➷ ➷ ➷ ➷ ➴ ➴ ➱ ➱ ❮ ❐ ❮ ❮ ❮ ❒ ❒ ❒ ❒ ❒ ❐ ➱ ❐ ❐ ❐ ✃ ✃ ✃ ✃ ✃ ➂ ➄ ➴ ➍ ➐ ➎ ➂ Ú ➏ ➎ ❿ Ú ➄ ➊ ➋ ➂ ➆ Ø ➐ ➎ ➂ Ø Ú Ø ➎ ➂ ➉ ➎ ➂ Ü ➎ ❿ ➄ ➋ ➀ ➆ ❿ ➆ ❿ ➄ ➋ ➂ ➀ ❿ ➏ ❿ ➅ ➉ × ➄ ➃ ➂ ❾ Ö ➋ ➃ ➎ Ö ➂ ➀ ➉ ➇ ❿ ❾ ❽ ➆ ➆ ❾ Ø ❾ ➍ ➄ ➋ ➂ ➆ ➋ ➃ ➂ ❽ ❿ ➀ ➃ ➉ ➇ ➂ ➀ ➉ ➎ ➴ ➷ ➴ ➭ ➯ ➯ ➯ ➯ ➭ ➭ ➭ ➭ ➫ ➲ ➫ ➫ ➫ ➫ ➩ ➩ ➩ ➩ ➯ ➲ ➩ ➵ ➺ ➸ ➸ ➴ ➸ ➸ ➵ ➵ ➵ ➲ ➵ ➳ ➳ ➳ ➳ ➳ ➲ ➲ ➩ ➨ ➺ ➤ ➥ ➥ ➥ ➥ ➤ ➤ ➤ ➤ ➡ ➦ ➡ ➡ ➡ ➡ ➠ ➠ ➠ ➠ ➥ ➦ ➨ ➧ ➨ ➨ ➨ ➨ ➧ ➧ ➧ ➧ ➧ ➦ ➧ ➧ ➧ ➦ ➦ ➦ ➦ ➦ ➦ ➺ ➸ ➺ ➪ ➶ ➶ ➶ ➶ ➶ ➶ ➶ ➪ ➪ ➶ ➪ ➪ ➪ ➪ ➪ ➪ ➚ ➺ ➚ ➶ ➹ ➚ ➘ ➴ ➴ ➴ ➘ ➘ ➘ ➘ ➘ ➘ ➹ ➘ ➘ ➹ ➹ ➹ ➹ ➹ ➹ ➹ ➚ ➚ ➚ ➼ ➼ ➼ ➼ ➼ ➼ ➼ ➼ ➻ ➽ ➻ ➻ ➻ ➻ ➻ ➻ ➻ ➻ ➽ ➼ ➾ ➾ ➽ ➚ ➚ ➽ ➾ ➚ ➾ ➽ ➾ A less contrived example ❿➁➀ Optimal solution ➊ÙÚ ❿➁➀ ❿➁➀ ❿➁➀ ❿➁➀ ➊ÛÚ ➊ÙØ ➫➢➭ ➷➢➬ ➤➢➥ ❰➢Ï ➮➢➱ ➠➢➡ ➽➢➾ Ð➢Ñ ✃➢❐ ➳➢➵ Ô➢Õ ➯➢➲ Ò➢Ó ❒➢❮ ➸➢➺
Variable dependent schedules High-Performance Embedded Systems-on-a-Chip – p.14/18
Recommend
More recommend