benchmarking the memory interface controller bus of the
play

Benchmarking the Memory Interface Controller bus of the Cell - PowerPoint PPT Presentation

CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis Whats missing? Benchmarking the Memory Interface Controller bus of the Cell processor Nathalie Casati EPFL December 18, 2007 CBEA The benchmarks Getting


  1. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Benchmarking the Memory Interface Controller bus of the Cell processor Nathalie Casati EPFL December 18, 2007

  2. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Cell Broadband Engine Architecture The Elements The main elements of the processor are : • The Power Processing Element (PPE)

  3. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Cell Broadband Engine Architecture The Elements The main elements of the processor are : • The Power Processing Element (PPE) • The 6 (usable) Synergistic Processing Elements (SPEs)

  4. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Cell Broadband Engine Architecture The Elements The main elements of the processor are : • The Power Processing Element (PPE) • The 6 (usable) Synergistic Processing Elements (SPEs) • The Element Interconnect Bus (EIB)

  5. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Cell Broadband Engine Architecture The Elements The main elements of the processor are : • The Power Processing Element (PPE) • The 6 (usable) Synergistic Processing Elements (SPEs) • The Element Interconnect Bus (EIB) • The I/O Controller (XIO)

  6. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Cell Broadband Engine Architecture The Elements The main elements of the processor are : • The Power Processing Element (PPE) • The 6 (usable) Synergistic Processing Elements (SPEs) • The Element Interconnect Bus (EIB) • The I/O Controller (XIO) • The Memory Interface Controller (MIC)

  7. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Cell Broadband Engine Architecture Other important facts • Each SPE has a 256KB local store and no cache

  8. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Cell Broadband Engine Architecture Other important facts • Each SPE has a 256KB local store and no cache • Each element is connected to the EIB with 25.6 GB/s bandwidth (ingoing / outgoing)

  9. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Cell Broadband Engine Architecture Other important facts • Each SPE has a 256KB local store and no cache • Each element is connected to the EIB with 25.6 GB/s bandwidth (ingoing / outgoing) • The 256MB main memory is connected to an external two channel Rambus XDR

  10. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Cell Broadband Engine Architecture Other important facts • Each SPE has a 256KB local store and no cache • Each element is connected to the EIB with 25.6 GB/s bandwidth (ingoing / outgoing) • The 256MB main memory is connected to an external two channel Rambus XDR • Each data transfer between SPEs and main memory is an explicit DMA operation up to 16KB

  11. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? The main question What happens if we use n SPEs at the same time while the MIC has only 25.6 GB/s bandwidth ? (for a DMA get operation)

  12. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? How to get the full bandwidth ? • Sequential accesses read or write equal amounts of data to all memory banks

  13. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? How to get the full bandwidth ? • Sequential accesses read or write equal amounts of data to all memory banks • Both effective address and the local storage address are 128-byte aligned

  14. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? How to get the full bandwidth ? • Sequential accesses read or write equal amounts of data to all memory banks • Both effective address and the local storage address are 128-byte aligned • Other factors like avoiding TLB misses

  15. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Part 1 Bandwidth graphs 25.6 25.6 20 20 10 10 5 5 Bandwidth [GB/s] Bandwidth [GB/s] 2 2 1 1 0.5 1 SPE 1 SPE 2 SPEs 2 SPEs 0.5 3 SPEs 3 SPEs 4 SPEs 4 SPEs 0.2 5 SPEs 5 SPEs 6 SPEs 0.2 6 SPEs 0.1 Max bandwidth Max bandwidth 0.1 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB Read DMA transfer size, 1 request Read DMA transfer size, 2 requests 25.6 25.6 20 20 10 10 5 5 Bandwidth [GB/s] Bandwidth [GB/s] 2 2 1 1 1 SPE 1 SPE 2 SPEs 2 SPEs 0.5 0.5 3 SPEs 3 SPEs 4 SPEs 4 SPEs 5 SPEs 5 SPEs 0.2 6 SPEs 0.2 6 SPEs Max bandwidth Max bandwidth 0.1 0.1 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB Read DMA transfer size, 3 requests Read DMA transfer size, 4 requests

  16. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Part 2 Bandwidth graphs 25.6 25.6 20 20 10 10 5 5 Bandwidth [GB/s] Bandwidth [GB/s] 2 2 1 1 1 SPE 1 SPE 2 SPEs 2 SPEs 0.5 0.5 3 SPEs 3 SPEs 4 SPEs 4 SPEs 5 SPEs 5 SPEs 0.2 6 SPEs 0.2 6 SPEs Max bandwidth Max bandwidth 0.1 0.1 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB Read DMA transfer size, 5 requests Read DMA transfer size, 6 requests 25.6 25.6 20 20 10 10 5 5 Bandwidth [GB/s] Bandwidth [GB/s] 2 2 1 1 1 SPE 1 SPE 2 SPEs 2 SPEs 0.5 0.5 3 SPEs 3 SPEs 4 SPEs 4 SPEs 5 SPEs 5 SPEs 0.2 6 SPEs 0.2 6 SPEs Max bandwidth Max bandwidth 0.1 0.1 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB Read DMA transfer size, 7 requests Read DMA transfer size, 8 requests

  17. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Results analysis and conclusion • One SPE never gets the whole bandwidth (only about half) → favours parallel accesses

  18. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Results analysis and conclusion • One SPE never gets the whole bandwidth (only about half) → favours parallel accesses • The EIB is optimized for larger transfers

  19. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? Results analysis and conclusion • One SPE never gets the whole bandwidth (only about half) → favours parallel accesses • The EIB is optimized for larger transfers • It is a good idea to use multibuffering

  20. CBEA The benchmarks Getting the full bandwidth Bandwidth graphs Results analysis What’s missing? What’s missing? The same benchmark with a DMA put operation

Recommend


More recommend