Exploiting Multi-Core Architectures for Fast Modular Synthesis - PowerPoint PPT Presentation

Exploiting Multi-Core Architectures for Fast Modular Synthesis LAC2008 Feb 29, 2008 Jürgen Reuter

Multiple Cores ● CPU speed only slowly growing ● Multi-Core CPUs now pervade market ● Ideal for thread-parallel compute-intensive tasks ● Most existing applications not yet parallelized ● Linux kernel support grown from SMP experience ● This talk: How to parallelize modular synthesis?

Module Topology Model ● Hierarchy of modules ● Input terminals ● Output terminals ● Primitive modules ● Composed modules ● No connections between different submodules

Module Tree Representation

Module Timing Model ● Goal: sample synchronous operation – One time step per sample ● Compute module output from module inputs ● Transfer module output samples to connected module inputs

Two-Phase Compute / Update ● Use multiple threads while (true) do { // Phase 1: Compute ● Goal: sample for all modules do { compute outputs for next time step synchronous update in terms of other module's outputs, ● But: dependencies but keep results private to this module between modules } // Phase 2: Update ● => Order of update for all modules do { significant publish outputs to other modules } ● Separate into phases } compute & update

Barrier Synchronization ● Start phase 2 only after phase 1 completed in all threads ● And vice versa ● => Use barriers to synchronize threads

Round-Robin Scheduling ● Spawn one thread per module? ● Bad idea: – OS schedules threads onto CPUs – => numerous task switches ● Solution: handle multiple modules per thread

Module-to-Thread Mapping (1) ● How many threads to spawn? – Few threads: bad exploitation of cores – Many threads: thread switch overhead – => find trade-off

Module-to-Thread Mapping (2) ● Assign which modules to which application thread? – Bad data locality => less cache hits – Bad load balancing => CPUs idle at barrier ● Here two approaches – Round-robin (i.e. pseudo-random) assignment of modules to threads => better load balancing? – Assignment according to Module tree representation => better data locality?

Evaluation ● Implementation in Java – Adjustable number of application threads – Java threads map to Linux native threads ● Compare distributions of modules among threads – Round-robin distribution – Topological distribution ● Dummy synth with array of ~2000 oscillators ● B/W SoundPaint run with ~1650 modules ● Run on Core Quad CPU Q6600 @ 2.40 GHz

Multi-Array Synth Parallel Synthesis

Multi-Array Synth Speed-Up

B/W SoundPaint Performance

B/W SoundPaint Speed-Up

Observations ● Only small overhead of multi-threaded algo (run with 1 thread) over sequential algo ● Optimal speed @ 4 threads (=number of cores) ● “Real-life” SoundPaint data not as clear as dummy synth array – Perhaps due to more irregular compute time? ● Topological distribution on the average much better than round-robin distribution

Future Work ● Reason for higher performance of topological distribution of modules still unclear – Bad data locality of round-robin distribution? – CPUs running idle at barriers? ● Overall speed-up still not satisfying – Investigate load balancing / idle CPUs at barriers – Let idle CPUs pre-compute samples (e.g. for modules without input terminals) – Merge local lightweight modules into heavier ones ● Complete SoundPaint implementation

Conclusion ● Spawn multiple threads to exploit multi-cores ● Don't spawn too many threads (thread switching overhead!) ● => Support an adjustable number of threads ● Carefully distribute the work among the threads – Data locality (avoid cache misses) – Load balancing (avoid idle CPUs at barriers)

Questions? ● Code (still very experimental) available at www.soundpaint.org/modsynth ● Relevant code for barrier synchronization and module distribution currently in class org.soundpaint.modsynth.syntest.Master

Exploiting Multi-Core Architectures for Fast Modular Synthesis - PowerPoint PPT Presentation

Exploiting Multi-Core Architectures for Fast Modular Synthesis LAC2008 Feb 29, 2008 Jrgen Reuter Multiple Cores CPU speed only slowly growing Multi-Core CPUs now pervade market Ideal for thread-parallel compute-intensive tasks

Modular Budgets Modular Budgets Modular Budgets Modular Budgets OSPA NANO Session 10/25/06

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Analyzing Throughput of GPUs Analyzing Throughput of GPUs Exploiting Within-Die Core-to-Core

Architectures Architectural styles Software architectures Architectures versus middleware

1 TEMPORARY MODULAR HOUSING Meeting Purpose Learn how Temporary Modular Housing will allow

Modular Applications, Loose Coupling, and the NetBeans Lookup API The Need for Modular

Managing Modular Software for your NuGet, C++ and Java Development Agenda Modular software

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Motivation Memory is a shared resource Core Core Memory Core Core Threads requests

Scheduling Multi-Periodic Mixed-Criticality DAGs on Multi-Core Architectures Roberto MEDINA

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time Multi/Many-Core Architecture

resources T M Modular Gold Plant MGP Environmentally Friendly True Modular

Modular Robots Modular Robots by D. Dibbern and A. Werdermann by D. Dibbern and A. Werdermann

WELCOME Temporary Modular Housing Community Information Session Thank you for joining us!

CPSC 410/611: Week 4 Threads CPU Scheduling Synchronization (Part I) CPU

Optimization on one core OpenMP, MPI and hybrid programming An introduction to the de-facto

pthreads pthreads (POSIX threads) is a library for doing threading pthreads Can

rrt rr

CSE 3320 Operating Systems POSIX Threads Programming Jia Rao Department of Computer Science and

Threads CSCI 136: Fundamentals of Computer Science II

Multithreading Horstmann ch.9 Multithreading Threads Thread states Thread

Exhibitor Training Wednesday, October 21, 2020 Important Dates October 22: Exhibitor Access to