IN-MEMORY COMPUTING AT SCALE? LOOK BEYOND PHYSICAL DRAM! Iacovos G. Kolokasis , Anastasios Papagiannis, Polyvios Pratikakis, and Angelos Bilas October 25, 2019 Institute of Computer Science (ICS) Foundation of Research and T echnology – Hellas (FORTH) & Computer Science Department, University of Crete
ANNUAL SIZE OF THE DRAM SCALING GLOBAL DATASPHERE TREND 10000 200 MEGABITS/CHIP S S R R A ZETABYTES A 1000 E E 2X/3 YEARS 2X/3 YEARS Y Y 100 5 5 100 . . 1 1 / / X X 10 2 2 0 0 3 6 9 2 5 1 1 1 1 1 2 2 0 0 0 0 0 0 1985 1995 2005 2015 2 2 2 2 2 2 YEAR YEAR Data is growing faster while DRAM scaling is getting diffjcult 1
ANNUAL SIZE OF THE NAND FLASH GLOBAL DATASPHERE SCALING TREND 200 4 DENSITY (TB) ZETABYTES 2 100 0 7 9 1 4 7 0 3 0 1 1 2 2 2 3 3 0 0 0 0 0 0 0 0 3 6 9 2 5 2 2 2 2 2 2 2 1 1 1 1 2 2 0 0 0 0 0 0 2 2 2 2 2 2 YEAR YEAR NAND Flash capacity is continuous scaling 2
DATA-INTENSIVE APPLICATIONS DNA/PROTEIN DNA/PROTEIN VIRTUAL IMAGE VIRTUAL IMAGE SYNTHESIS SYNTHESIS REALITY ANALYSIS REALITY ANALYSIS IN-MEMORY FRAMEWORKS IN-MEMORY FRAMEWORKS More demand for memory More demand for memory 3
APACHE SPARK IN-MEMORY COMPUTING RDD Operation 1 RDD RDD . . . RDD Operation n RDD RDD RDD RDD Operation 1 Operation n DISK RAM RAM DISK 4
INTRODUCTION TO SPARK IN-MEMORY COMPUTING MEMORY_AND_DISK MEMORY_ONL Y SERIALIZE RDD RDD RDD RDD partition partition DISK MEMORY MEMORY MEMORY 5
LET’S EXPLOIT THE CAPACITY OF STORAGE DEVICES JVM-based Analytics Frameworks Serialization / Memory-Mapped fjle I/O Deserialization We explore both approaches 6
SERIALIZATION / DESERIALIZATION (LIMITATIONS) • Out-of-memory Errors due to small size of heaps. • Large computing results are generated during processing a record • Serialization / Deserialization afgects CPU performance • GC overhead to reclaim long-lived accumulated objects • Iterative applications 7
ON-GOING WORK Non- DRAM JVM Device Heap Other Heap Heap fmap DRAM Storage Device 8
ON-GOING WORK • Data placement policy inside JVM to manipulate Objects • Short-Lived data objects on DRAM Heap • Long-Lived data objects on Storage Device Heap • Add extra Storage Level in Apache Spark to support caching RDDs on Storage Heap • Thorough evaluation on SSDs, NVMe, Optane devices 9
CONTACT INFORMATION Iacovos G. Kolokasis MSc Student, Computer Science Department, University of Crete kolokasis@ics.forth.gr Institute of Computer Science (ICS) Foundation for Research and Technology Hellas (FORTH) www.ics.forth.gr 10
Recommend
More recommend