a system on on a a chip lock chip lock a system cache
play

A System- -on on- -a a- -Chip Lock Chip Lock A System Cache - PowerPoint PPT Presentation

A System- -on on- -a a- -Chip Lock Chip Lock A System Cache with Task Preemption Cache with Task Preemption Support Support By By Bilge S. Bilge S. Akgul Akgul, , Jaehwan Jaehwan Lee and Lee and Vincent J. Mooney Vincent J.


  1. A System- -on on- -a a- -Chip Lock Chip Lock A System Cache with Task Preemption Cache with Task Preemption Support Support By By Bilge S. Bilge S. Akgul Akgul, , Jaehwan Jaehwan Lee and Lee and Vincent J. Mooney Vincent J. Mooney Georgia Institute of Technology Georgia Institute of Technology School of Electrical and Com puter Engineering School of Electrical and Com puter Engineering

  2. Outline Outline � Introduction Introduction � � Background Background � � Lock Synchronization Problem s Lock Synchronization Problem s � � Our Methodology Our Methodology � � Hardware and Software Designs Hardware and Software Designs � � Experim ents and Results Experim ents and Results � � Conclusion Conclusion �

  3. Introduction Introduction � Multi Multi- -processor shared memory processor shared memory SoC SoC � � Intertask Intertask/ /interprocess interprocess synchronization synchronization � � Lock synchronization overheads Lock synchronization overheads � � Lock delay, lock latency Lock delay, lock latency � � Memory bandwidth consumption Memory bandwidth consumption � � Aim: Aim: � � Reduce overheads Reduce overheads � � Improve Real Time (RT) predictability Improve Real Time (RT) predictability �

  4. Background Background � Critical Section Critical Section � � Code section where shared data between multiple Code section where shared data between multiple execution units is accessed execution units is accessed � E.g., multiple readers and multiple writers E.g., multiple readers and multiple writers � A lock is necessary to guarantee the consistency of A lock is necessary to guarantee the consistency of shared data (e.g., global variables) shared data (e.g., global variables) � Lock Delay Lock Delay � � Time between release and acquisition of a lock Time between release and acquisition of a lock � Lock Latency Lock Latency � � Time to acquire a lock in the absence of contention Time to acquire a lock in the absence of contention

  5. Problems Problems � Ensuring mutual exclusiveness Ensuring mutual exclusiveness � � Communication bandwidth consumption Communication bandwidth consumption � � Eliminate busy Eliminate busy- -wait problems wait problems � � Busy Busy- -wait: If lock is busy, processors spin on wait: If lock is busy, processors spin on � memory bus memory bus � Effective lock hand off necessary Effective lock hand off necessary � � Fair Fair � � Predictive Predictive �

  6. Previous Work Previous Work � Spin Spin- -lock alternatives ( lock alternatives ( Anderson ’90 ) Anderson ’90 ) � � Spin Spin- -on on- -read (spin on cache), delays in spin read (spin on cache), delays in spin- -loops loops � Queue based software locks Queue based software locks � � Array based queuing ( Array based queuing ( Anderson ’90 ) Anderson ’90 ) � MCS locks ( MCS locks ( Mellor , Scott ‘91 ) ) Mellor- -Crummey Crummey, Scott ‘91 � LH and M locks ( LH and M locks ( Ladin ) 94 ) Ladin, , Hagerston Hagerston, Magnusson , Magnusson ’ ’94 � Queue based hardware locks Queue based hardware locks � � QOLBY ( QOLBY ( Kagi ) – – makes use of collocation makes use of collocation 99 ) Kagi ’ ’99 � Cache Cache- -based locks ( based locks ( Ramachandran ) � 96 ) Ramachandran’ ’96 � Memory consistency model Memory consistency model � New cache design, extra cache states for locks New cache design, extra cache states for locks

  7. Methodology Methodology � Custom hardware unit: SoC Lock Cache Custom hardware unit: SoC Lock Cache � � Utilize advantages of Utilize advantages of SoC SoC Design Design � � Short Critical Sections covered in DATE Short Critical Sections covered in DATE ’ ’01 01 � � Critical Sections may be long or short Critical Sections may be long or short � � Support preemption of tasks when necessary Support preemption of tasks when necessary � � Hardware Hardware- -interrupt triggered notification interrupt triggered notification � � Lock requests handled on a processor Lock requests handled on a processor- -by by- - � processor basis processor basis � Separate the lock variables according to the Separate the lock variables according to the � critical section lengths critical section lengths

  8. SoC Lock Cache Hardware SoC Lock Cache Hardware Mechanism Mechanism P2 P1 PN SoC Lock Cache Memory SoC Arbit rat ion Lock Logic Cache

  9. Methodology Methodology Application Software � Multiple application tasks Multiple application tasks � (Tasks) � Atalanta Atalanta- -RTOS RTOS � Extension Atalanta-RTOS � Multi Multi- -processor set processor set- -up up � Software with MPC750s with MPC750s Hardware � SoCLC SoCLC provides lock provides lock � MPC750 MPC750 synchronization among synchronization among SoC processors processors Lock Cache MPC750 MPC750

  10. Hardware Simulation Set- -up up Hardware Simulation Set � Seamless CVE from Mentor Graphics � 4 MPC750s � SoC Lock Cache Unit (SoCLC) � Shared Memory � Interface Logic

  11. Software Task 1 : CS access Processor 1 In the case of long Critical Sections, Processor 2 Task 2 : Try to access CS Busy-Wait CS access non-preemptive Task 3 synchronization causes Task 1 :CS access inefficient CPU Interrupt Processor 1 utilization among Processor 2 tasks. Task 2 : Try to access CS CS access preempt Task 3 Tasks Execution Context Sw Time Improvement and ISR

  12. Software Software Lock-wait table 1 7 6 5 4 3 2 1 0 15 15 14 14 13 13 12 12 11 11 10 10 9 8 Lock 1 Lock 1 23 23 22 22 21 21 20 20 19 19 18 18 17 17 16 16 Lock 2 Lock 2 31 31 30 30 29 29 28 28 27 27 26 26 25 25 24 24 Lock 3 Lock 3 39 39 38 38 37 37 36 36 35 35 34 34 33 33 32 32 Lock 4 Lock 4 47 47 46 46 45 45 44 44 43 43 42 42 41 41 40 40 55 55 54 54 53 53 52 52 51 51 50 50 49 49 48 48 … 63 63 62 62 61 61 60 60 59 59 58 58 57 57 56 56 Lock n Lock n 7 6 5 4 3 2 1 0 15 15 14 14 13 13 12 12 11 11 10 10 9 8 � Assume 64 tasks Assume 64 tasks � 23 23 22 22 21 21 20 20 19 19 18 18 17 17 16 16 � Each lock keeps a lock Each lock keeps a lock- - 31 31 30 30 29 29 28 28 27 27 26 26 25 25 24 24 � 39 39 38 38 37 37 36 36 35 35 34 34 33 33 32 32 wait table of 64- -bit entries bit entries wait table of 64 47 47 46 46 45 45 44 44 43 43 42 42 41 41 40 40 55 55 54 54 53 53 52 52 51 51 50 50 49 49 48 48 � Expandable to > 64 Expandable to > 64 � 63 63 62 62 61 61 60 60 59 59 58 58 57 57 56 56 � Tables accessed by ISR Tables accessed by ISR � Lock-wait table 2

  13. Software Software Lock_longCS task1 Read_lock task2 PE1 Free? task3 Remove task return from task4 PE2 from ready table Lock_longCS Context Execution Execute Switch without holding lock long CS New task Holding lock Interrupt UnLock Fail to acquire lock Execute ISR, Release lock Interrupt Handler

  14. Experiments Experiments Database Application (database object flow) Server address � With With Atalanta Atalanta Client address � space space RTOS RTOS shared Server Client data � With 4 MPC750s With 4 MPC750s � � Database Example Database Example � Shared client server Memory application (run application (run local local with 40 tasks) with 40 tasks) memory memory

  15. Experiments Experiments Example Database Application Transactions Observed Observed Performance Performance Improvement with Improvement with Lock Cache Unit Lock Cache Unit • 100% speedup in lock 100% speedup in lock • delay delay • 32% speedup in lock 32% speedup in lock • latency latency • • 27% speedup in total 27% speedup in total execution time execution time

  16. Experiments Experiments Long CS lock results Without With Without With Speedup Speedup SoCLC SoCLC SoCLC SoCLC Lock Lock • Atalanta RTOS Latency Latency 1.32x 1200 908 1.32x 1200 908 • 40 tasks (clk ( clk cycles) cycles) • 4 PEs Lock Delay Lock Delay 47,264 23,590 2.00x 2.00x 47,264 23,590 (clk clk cycles) cycles) ( Exe. Time Exe. Time 36.9M 29M 1.27x 1.27x 36.9M 29M (clk clk cycles) cycles) (

  17. Experiments Experiments Sm all CS lock results Without With Without With Speedup Speedup SoCLC SoCLC SoCLC SoCLC Lock Lock • Atalanta RTOS Latency Latency 27x 884 32 27x 884 32 • 40 tasks (clk ( clk cycles) cycles) • 4 PEs Lock Delay Lock Delay 8936 102 87.6x 87.6x 8936 102 (clk clk cycles) cycles) (

Recommend


More recommend