processing in storage class memory
play

Processing in Storage Class Memory Joel Nider Craig Mustard - PowerPoint PPT Presentation

Processing in Storage Class Memory Joel Nider Craig Mustard Andrada Zoltan Alexandra Fedorova Embedding Processors in SCM CPU Non-volatile RAM Storage Latency Is Decreasing Scaling Compute with Storage Storage Arrays Persistent Smart


  1. Processing in Storage Class Memory Joel Nider Craig Mustard Andrada Zoltan Alexandra Fedorova

  2. Embedding Processors in SCM CPU Non-volatile RAM

  3. Storage Latency Is Decreasing

  4. Scaling Compute with Storage Storage Arrays Persistent Smart Disks / SSD SCM PIM in RAM Volatile Smart Caches CPU + registers Latency

  5. Scaling Compute with Storage Storage Arrays Persistent Smart Disks / SSD SCM PIM in RAM Volatile Smart Caches CPU + registers Latency

  6. Benefits of PIM on SCM DPU SCM CPU Memory bus DRAM

  7. Benefits of PIM on SCM CPU Memory bus

  8. Benefits of PIM on SCM CPU Memory bus

  9. Benefits of PIM on SCM DPU CPU Memory bus SCM

  10. Benefits of PIM on SCM Core Density 64 4 GB Ratio: 1:64 MB SCM Capacity: DPU Count: CPU Memory bus

  11. Benefits of PIM on SCM 128 8 GB Ratio: 1:64 MB SCM Capacity: DPU Count: CPU Memory bus

  12. Benefits of PIM on SCM 256 16 GB Ratio: 1:64 MB SCM Capacity: DPU Count: CPU Memory bus

  13. Benefits of PIM on SCM 512 32 GB Ratio: 1:64 MB SCM Capacity: DPU Count: CPU Memory bus

  14. Benefits of PIM on SCM CPU Memory bus

  15. PIM Design Points Address Inter-PIM Translation Communication Instruction Core Set Density

  16. UPMEM Architecture and Limitations DPU DRAM

  17. UPMEM Architecture and Limitations DPU SRAM Control DRAM DDR Interface External Bus

  18. Interleaved Multithreading

  19. UPMEM Architecture and Limitations Input data ABCDEFGHIJKLMNOPQRSTUV Memory bus DPU 0 DPU 1 DPU 2

  20. UPMEM Architecture and Limitations A B C D E F G H Input data IJKLMNOPQRSTUVWXYZabcd Memory bus A DPU 0 B DPU 1 C DPU 2

  21. UPMEM Architecture and Limitations A B C D E F G H I J K L M N O P Input data QRSTUVWXYZabcdefghijkl Memory bus AI DPU 0 BJ DPU 1 CK DPU 2

  22. Raw Performance: Throughput 9 ranks x 64 DPUS = 576 DPUs 64KB SRAM 576 DPUs x 64MB = 36GB DRAM 64 MB 36 GB in 0.16 s = 252 GB/s DRAM DPU Top speed of DDR4-2400 channel: 19GB/s 16 threads @ 2KB per transfer

  23. Use Case: Compression File Size DPUs spamfile 84 MB 172 mozilla 50 MB 105 nci 30 MB 64 dickens 10 MB 35 sao 7 MB 21 xml 5 MB 15 world192 1 MB 4 plrabn12 0.5 MB 2 terror2 0.1 MB 1

  24. Wishlist Concurrent Data Triggered Functions Memory Access Mix Of Tuning For Memory Types Performance

  25. Future Directions Hyperdimensional Computing Regular Expression ? Search

  26. Thank you for watching Joel Nider joel@ece.ubc.ca Craig Mustard craigm@ece.ubc.ca Andrada Zoltan zoltandrada@gmail.com Alexandra Fedorova sasha@ece.ubc.ca

Recommend


More recommend