Beyond'the'Wall:' Near0Data'Processing'for'Databases Sam$Xi ,'Ore'Babarinsa,' Manos$Athanassoulis ,'Stratos Idreos HARVARD'UNIVERSITY 1
Memory'Wall
Memory'Wall HARVARD'UNIVERSITY 3
Row'store Column'store tuple tuple HARVARD'UNIVERSITY 4
Memory0optimized'data'systems HARVARD'UNIVERSITY 5
Data'access' remains$ the'bottleneck HARVARD'UNIVERSITY 6
HARVARD'UNIVERSITY 7
σ Σ π HARVARD'UNIVERSITY 8
We'are'not'the'first'to'visit'this'pyramid! HARVARD'UNIVERSITY 9
Intelligent'RAM DIVA NearRdata' processing RADram LogicRinRmemory Terasys HARVARD'UNIVERSITY 10
Why'did'NDP'not'take'off? DRAM Logic Leakage Low High Switching2speed Slow Fast Fabrication2processes2are2incompatible HARVARD'UNIVERSITY 11
Moore’s'Law'+'Dennard'scaling provided'consistent'performance'scaling'for'years Metric Scaling2factor 1/κ 2 Area Delay 1/κ Power 1 Moore’s'Law. Dennard'scaling. Not'the'case'anymore! HARVARD'UNIVERSITY 12
HARP Q100 Widx Our$approach Ibex HARVARD'UNIVERSITY 13
Outline Intro NDP'for'data'systems:'Past'and'present The'architecture'of'JAFAR Experimental'results Conclusion HARVARD'UNIVERSITY 14
Opportunity'for'NDP Query Lots2of2data Host'server Database … Filter2data2before2 Many'rows'fail'the' query'predicate'and' it2is2sent2to2CPU. are'discarded. HARVARD'UNIVERSITY 15
JAFAR:'“Just”'A'Filtering' Accelerator'on'Relations CPU CPU CPU CPU Last'level'cache System'bus'+'memory'controller JAFAR JAFAR DRAM DRAM HARVARD'UNIVERSITY 16
Rank Row'address'decoder Chip Bank20 Bank20 Bank20 Bank20 Sense2Amps Sense2Amps Sense2Amps Sense2Amps Column'address'decoder HARVARD'UNIVERSITY 17
Rank Bank Row'address'decoder Rank Bank20 Array20 Array21 Bank20 Bank20 Bank20 Sense2Amps Sense2Amps Sense2Amps Array22 Array23 Sense2Amps Column'address'decoder HARVARD'UNIVERSITY 18
JAFAR:'Overall'design CPU CPU CPU CPU Last'level'cache System'bus'+'memory'controller JAFAR JAFAR DRAM DRAM HARVARD'UNIVERSITY 19
JAFAR'context RAS Bank20 From'CPU Memory2 Bank20 Bank20 access2 Bank20 arbiter Sense2Amps Sense2Amps Sense2Amps Sense'Amps CAS IO'buffer JAFAR HARVARD'UNIVERSITY 20
JAFAR'architecture From1IO1buffer Data'latch Right Left Opcode Opcode ALU ALU Comparison'is'true? page'offset'bitmask write'enable Page'offset'counter Output'buffer HARVARD'UNIVERSITY 21
Programming'JAFAR int errno = select_jafar( void* col_data, int range_low, int range_high, uint8_t* out_buf, size_t num_input_rows, size_t* num_output_rows); HARVARD'UNIVERSITY 22
Handling'multiple'modules CPU CPU CPU CPU Last'level'cache System'bus'+'memory'controller JAFAR JAFAR DRAM DRAM HARVARD'UNIVERSITY 23
Handling'multiple'modules Fill'up'each'module'first CPU CPU CPU CPU Last'level'cache System'bus'+'memory'controller JAFAR JAFAR DRAM DRAM HARVARD'UNIVERSITY 24
Handling'multiple'modules Interleave'data'across'modules CPU CPU CPU CPU Last'level'cache System'bus'+'memory'controller JAFAR JAFAR DRAM DRAM HARVARD'UNIVERSITY 25
Coordinating'memory'access The'CPU'and'JAFAR'cannot'simultaneously'attempt' to'access'memory. CPU'grants'JAFAR'ownership'to'a'DRAM'rank'for'a' period'of'time. Possible'mechanism:'DRAM'mode'registers HARVARD'UNIVERSITY 26
Experimental'setup Simulation'framework gem5 OutRofRorder'CPU Classic'cache'model SimpleDRAM HARVARD'UNIVERSITY 27
Experimental'setup Queries,'input'data,'and'database select * from table where column < n ; InRhouse'column'store' database 0 1M 4'million'rows'of' unsorted'integers HARVARD'UNIVERSITY 28
Experimental'results HARVARD'UNIVERSITY 29
Memory'contention Scheduling'of'ownership'transfers'will'be' important What'would'JAFAR’s'performance'look'like' without a'scheduler? HARVARD'UNIVERSITY 30
Memory'contention Memory'requests Memory'requests CPU Idle'period JAFAR'can'execute HARVARD'UNIVERSITY 31
Idle'periods'on'TPC0H HARVARD'UNIVERSITY 32
JAFAR'as'a'framework More'operators ! Aggregations ! Projections ! Sort ? Joins HARVARD'UNIVERSITY 33
JAFAR'as'a'framework' Data'types'and'layouts RowRstores'and'hybrids Multiple$filters$per$row Efficient$projections Variable'length'datatypes Process$on$CPU? HARVARD'UNIVERSITY 34
NDP'is'an'exciting'opportunity'for' innovation'in'data'systems HARVARD'UNIVERSITY 35
NDP'is'a'promising'solution'to'the' memory'wall'for'data'systems. JAFAR'provides'up'to'9x'speedup'on' simple'select'queries. JAFAR'is'built'on'an'extensible' framework'for'accelerating'data'systems. HARVARD'UNIVERSITY 36
Thank'you HARVARD'UNIVERSITY 37
Recommend
More recommend