Do we still care about single thread performance? ACACES - PowerPoint PPT Presentation

 Do ¡we ¡still ¡care ¡about ¡single ¡thread ¡ performance? ACACES ¡2008 Dean ¡Tullsen

Speedup 1.0 .1 .9 ACACES ¡2008 Dean ¡Tullsen

Speedup 1.0 .1 .9 1/.55 ¡= ¡1.82 .1 .45 ACACES ¡2008 Dean ¡Tullsen

Speedup 1.0 .1 .9 1/.55 ¡= ¡1.82 .1 .45 1/.325 ¡= ¡3.07 .1 .225 ACACES ¡2008 Dean ¡Tullsen

Speedup 1.0 .1 .9 1/.55 ¡= ¡1.82 .1 .45 1/.325 ¡= ¡3.07 .1 .225 .1 < ¡10 ACACES ¡2008 Dean ¡Tullsen

ACACES ¡2008 Dean ¡Tullsen

 Parallelism ¡– ¡Use ¡multiple ¡contexts ¡to ¡achieve ¡better ¡ performance ¡than ¡possible ¡on ¡a ¡single ¡context. ACACES ¡2008 Dean ¡Tullsen

 Parallelism ¡– ¡Use ¡multiple ¡contexts ¡to ¡achieve ¡better ¡ performance ¡than ¡possible ¡on ¡a ¡single ¡context.  Traditional ¡Parallelism ¡– ¡We ¡use ¡extra ¡threads/processors ¡ to ¡ offload ¡computation . ¡ ¡Threads ¡divide ¡up ¡the ¡execution ¡ stream. ACACES ¡2008 Dean ¡Tullsen

 Parallelism ¡– ¡Use ¡multiple ¡contexts ¡to ¡achieve ¡better ¡ performance ¡than ¡possible ¡on ¡a ¡single ¡context.  Traditional ¡Parallelism ¡– ¡We ¡use ¡extra ¡threads/processors ¡ to ¡ offload ¡computation . ¡ ¡Threads ¡divide ¡up ¡the ¡execution ¡ stream.  Non-‑traditional ¡parallelism ¡– ¡Extra ¡threads ¡are ¡used ¡to ¡ speed ¡up ¡computation ¡ without ¡necessarily ¡off-‑loading ¡any ¡ of ¡the ¡original ¡computation  Primary ¡advantage ¡  ¡nearly ¡any ¡code, ¡no ¡matter ¡how ¡inherently ¡ serial, ¡can ¡benefit ¡from ¡parallelization.  Another ¡advantage ¡– ¡threads ¡can ¡be ¡added ¡or ¡subtracted ¡without ¡ significant ¡disruption. ACACES ¡2008 Dean ¡Tullsen

Thread ¡1 ¡ ¡ ¡ ¡Thread ¡2 ¡ ¡ ¡Thread ¡3 ¡ ¡ ¡Thread ¡4 ACACES ¡2008 Dean ¡Tullsen

Thread ¡1 ¡ ¡ ¡ ¡Thread ¡2 ¡ ¡ ¡Thread ¡3 ¡ ¡ ¡Thread ¡4  Speculative ¡ precomputation, ¡dynamic ¡ speculative ¡ precomputation, ¡many ¡ others. ACACES ¡2008 Dean ¡Tullsen

Thread ¡1 ¡ ¡ ¡ ¡Thread ¡2 ¡ ¡ ¡Thread ¡3 ¡ ¡ ¡Thread ¡4  Speculative ¡ precomputation, ¡dynamic ¡ speculative ¡ precomputation, ¡many ¡ others.  Most ¡commonly ¡– ¡ prefetching, ¡possibly ¡ branch ¡pre-‑calculation. ACACES ¡2008 Dean ¡Tullsen

 Chappell, ¡Stark, ¡Kim, ¡Reinhardt, ¡Patt, ¡ “Simultaneous ¡Subordinate ¡Micro-‑threading” ¡ 1999  Use ¡microcoded ¡threads ¡to ¡manipulate ¡the ¡ microarchitecture ¡to ¡improve ¡the ¡performance ¡of ¡ the ¡main ¡thread.  Zilles ¡2001, ¡Collins ¡2001, ¡Luk ¡2001  Use ¡a ¡regular ¡SMT ¡thread, ¡with ¡code ¡distilled ¡from ¡ the ¡main ¡thread, ¡to ¡support ¡the ¡main ¡thread. ACACES ¡2008 Dean ¡Tullsen

 Speculative ¡Precomputation ¡[Collins, ¡et ¡al ¡ 2001 ¡– ¡Intel/UCSD]  Dynamic ¡Speculative ¡Precomputation  Event-‑Driven ¡Simultaneous ¡Optimization  Value ¡Specialization  Inline ¡Prefetching  Thread ¡Prefetching ACACES ¡2008 Dean ¡Tullsen

Perfect Memory Perfect Delinquent Loads (10) 32.64 32.642 27.90 24.731 Speedup 16.821 8.910 6.28 5.79 4.79 4.46 3.30 2.76 2.47 1.41 1.14 1.04 1.000 art equake gzip mcf health mst Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

 In ¡SP, ¡a ¡ p-‑slice ¡is ¡a ¡thread ¡derived ¡from ¡a ¡trace ¡of ¡ execution ¡between ¡a ¡ trigger ¡instruction ¡and ¡the ¡ delinquent ¡load. Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

 In ¡SP, ¡a ¡ p-‑slice ¡is ¡a ¡thread ¡derived ¡from ¡a ¡trace ¡of ¡ execution ¡between ¡a ¡ trigger ¡instruction ¡and ¡the ¡ delinquent ¡load.  All ¡instructions ¡upon ¡which ¡the ¡load’s ¡address ¡is ¡not ¡ dependent ¡are ¡removed ¡(often ¡90-‑95%). Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

 In ¡SP, ¡a ¡ p-‑slice ¡is ¡a ¡thread ¡derived ¡from ¡a ¡trace ¡of ¡ execution ¡between ¡a ¡ trigger ¡instruction ¡and ¡the ¡ delinquent ¡load.  All ¡instructions ¡upon ¡which ¡the ¡load’s ¡address ¡is ¡not ¡ dependent ¡are ¡removed ¡(often ¡90-‑95%).  Live-‑in ¡register ¡values ¡(typically ¡2-‑6) ¡must ¡be ¡ explicitly ¡copied ¡from ¡main ¡thread ¡to ¡helper ¡thread. Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

Delinquent ¡load Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

Trigger ¡instruction Delinquent ¡load Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

Trigger ¡instruction Spawn ¡thread Delinquent ¡load Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

Trigger ¡instruction Spawn ¡thread Prefetch Delinquent ¡load Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

Trigger ¡instruction Spawn ¡thread Prefetch Memory ¡latency Delinquent ¡load Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

 Because ¡SP ¡uses ¡actual ¡program ¡code, ¡can ¡precompute ¡ addresses ¡that ¡fit ¡no ¡predictable ¡pattern. Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

 Because ¡SP ¡uses ¡actual ¡program ¡code, ¡can ¡precompute ¡ addresses ¡that ¡fit ¡no ¡predictable ¡pattern.  Because ¡SP ¡runs ¡in ¡a ¡separate ¡thread, ¡it ¡can ¡interfere ¡ with ¡the ¡main ¡thread ¡much ¡less ¡than ¡software ¡ prefetching. ¡When ¡it ¡isn’t ¡working, ¡it ¡can ¡be ¡killed. Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

 Because ¡SP ¡uses ¡actual ¡program ¡code, ¡can ¡precompute ¡ addresses ¡that ¡fit ¡no ¡predictable ¡pattern.  Because ¡SP ¡runs ¡in ¡a ¡separate ¡thread, ¡it ¡can ¡interfere ¡ with ¡the ¡main ¡thread ¡much ¡less ¡than ¡software ¡ prefetching. ¡When ¡it ¡isn’t ¡working, ¡it ¡can ¡be ¡killed.  Because ¡it ¡is ¡decoupled ¡from ¡the ¡main ¡thread, ¡the ¡ prefetcher ¡is ¡not ¡constrained ¡by ¡the ¡control ¡flow ¡of ¡the ¡ main ¡thread. Dean ¡Tullsen Processor ¡Architecture ¡and ¡Compilation ¡Lab ACACES ¡2008 Dean ¡Tullsen

Do we still care about single thread performance? ACACES - PowerPoint PPT Presentation

Do we still care about single thread performance? ACACES 2008 Dean Tullsen Speedup 1.0 .1 .9 ACACES 2008 Dean Tullsen Speedup 1.0 .1 .9 1/.55 = 1.82 .1 .45 ACACES 2008 Dean

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

Single Single- -Thread NVE Thread NVE Multiple Subsystems, Multiple Threads Multiple

To thread or not to thread? Why PETSc favors MPI-only Plenary Discussion PETSc User Meeting 2016

CS 6958 USIMM PROJECT PHASE March 5, 2014 Single TM L1 Bank 1 Bank 0 Thread PC Stack RF

+ Section 3 Threading and Locking + Definitions What is a thread? + Definitions What is a

Design of Thread-Safe Classes 1 Topic Outline Thread-Safe Classes Principles Confinement

Synthesizing Commutativity Conditions Kshitij Bansal Eric Koskinen Omer Tripp New York

Roadmap for Section 4.3. Windows Process and Thread Internals Thread Block, Process Block Flow

Directive-Based Programming with OpenMP Shared Memory Programming Explicit thread creation

CPL 2016, week 3 Thread management: execution and shutdown Oleg Batrashev Institute of Computer

CPL 2016, week 5 Inter-thread collaboration Oleg Batrashev Institute of Computer Science, Tartu,

CS 6958 LECTURE 9 TRAX MEMORY MODEL February 5, 2014 Recap: TRaX Thread DRAM L2 L1 Thread

CPL 2016, week 4 Inter-thread communication Oleg Batrashev Institute of Computer Science, Tartu,

Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning Andrew Habib,

What is a Thread? A thread lives within a process; A process can have several threads.

MULTITREADING What is a thread? A thread is a concurrent unit of execution Threads share

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Bank Stress test Results and Their Impact on Consumer Credit Markets (by Sumit Agarwal, Xudong

Credit Growth and the Financial Crisis: A New Narrative Stefania Albanesi, University of

HSBC Finance Corporation IFRS Management Basis 9 November 2011 1 Disclosure Statement This

Trauma-informed Practices Oct 22, 2020 WELCOME AGENDA & GOALS FOR TODAY General Goal :

Assurance Submissions PREA Management Office Bureau of Justice Assistance Office of Justice

Racial and Ethnic Differences in Foreclosure Race Unconditional Underwriting Subprime

Economics 2 Professor Christina Romer Spring 2018 Professor David Romer LECTURE 3 SUPPLY AND

Do we still care about single thread performance? ACACES - PowerPoint PPT Presentation

Do we still care about single thread performance? ACACES 2008 Dean Tullsen Speedup 1.0 .1 .9 ACACES 2008 Dean Tullsen Speedup 1.0 .1 .9 1/.55 = 1.82 .1 .45 ACACES 2008 Dean

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

Single Single- -Thread NVE Thread NVE Multiple Subsystems, Multiple Threads Multiple

To thread or not to thread? Why PETSc favors MPI-only Plenary Discussion PETSc User Meeting 2016

CS 6958 USIMM PROJECT PHASE March 5, 2014 Single TM L1 Bank 1 Bank 0 Thread PC Stack RF

+ Section 3 Threading and Locking + Definitions What is a thread? + Definitions What is a

Design of Thread-Safe Classes 1 Topic Outline Thread-Safe Classes Principles Confinement

Synthesizing Commutativity Conditions Kshitij Bansal Eric Koskinen Omer Tripp New York

Roadmap for Section 4.3. Windows Process and Thread Internals Thread Block, Process Block Flow

Directive-Based Programming with OpenMP Shared Memory Programming Explicit thread creation

CPL 2016, week 3 Thread management: execution and shutdown Oleg Batrashev Institute of Computer

CPL 2016, week 5 Inter-thread collaboration Oleg Batrashev Institute of Computer Science, Tartu,

CS 6958 LECTURE 9 TRAX MEMORY MODEL February 5, 2014 Recap: TRaX Thread DRAM L2 L1 Thread

CPL 2016, week 4 Inter-thread communication Oleg Batrashev Institute of Computer Science, Tartu,

Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning Andrew Habib,

What is a Thread? A thread lives within a process; A process can have several threads.

MULTITREADING What is a thread? A thread is a concurrent unit of execution Threads share

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Bank Stress test Results and Their Impact on Consumer Credit Markets (by Sumit Agarwal, Xudong

Credit Growth and the Financial Crisis: A New Narrative Stefania Albanesi, University of

HSBC Finance Corporation IFRS Management Basis 9 November 2011 1 Disclosure Statement This

Trauma-informed Practices Oct 22, 2020 WELCOME AGENDA &amp; GOALS FOR TODAY General Goal :

Assurance Submissions PREA Management Office Bureau of Justice Assistance Office of Justice

Racial and Ethnic Differences in Foreclosure Race Unconditional Underwriting Subprime

Economics 2 Professor Christina Romer Spring 2018 Professor David Romer LECTURE 3 SUPPLY AND

Trauma-informed Practices Oct 22, 2020 WELCOME AGENDA & GOALS FOR TODAY General Goal :