Doppelgänger: A Cache for Approximate Computing Joshua San Miguel Jorge Albericio Andreas Moshovos Natalie Enright Jerger
Cache Hierarchy main memory shared last-level cache private caches processor core 2
Cache Hierarchy main memory shared last-level cache private caches processor core 3
Cache Hierarchy main memory Accessing memory is 10x – 100x greater latency and energy than accessing private cache! shared last-level cache private caches processor core 4
Cache Hierarchy main memory Accessing memory is 10x – 100x greater latency and energy than accessing private cache! shared last-level cache Need hierarchy of large caches… private caches processor core 5
Cache Hierarchy main memory shared last-level cache private caches processor core 6
Cache Hierarchy main memory shared last-level cache private caches processor core 7
Cache Hierarchy main memory But last-level cache consumes substantial energy and takes up 30%-50% of chip area! shared last-level cache private caches processor core 8
Cache Hierarchy main memory But last-level cache consumes substantial energy and takes up 30%-50% of chip area! shared last-level cache Higher efficiency via Approximate Computing … private caches processor core 9
Summary Doppelgänger Cache: Identifies approximate similarity in data block values. 77% cache storage savings of approximable data. 10
Summary Doppelgänger Cache: Identifies approximate similarity in data block values. 77% cache storage savings of approximable data. Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques. 11
Summary Doppelgänger Cache: Identifies approximate similarity in data block values. 77% cache storage savings of approximable data. Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques. Significantly reduces area and energy consumption. Reduces total on-chip cache area by 1.36x . 12
Outline Approximate Computing Approximate Similarity Doppelgänger Cache Cache Architecture Similarity Mapping Evaluation 13
Approximate Computing Not all data/computations need to be precise. Data mining Computer vision Audio and video processing http://www.zentut.com/ http://www.cc.gatech.edu/~cnieto6/ http://themusicparlour.blogspot.ca/ Gaming Machine learning Dynamical simulation http://www.businessweek.com/ http://www.analyticbridge.com/ http://www.scientific-computing.com/ 14
Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 15
Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 16
Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 1 92 131 183 91 132 186 2 90 131 185 93 133 184 3 35 31 29 43 38 37 17
Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 1 92 131 183 91 132 186 approximately similar 2 90 131 185 93 133 184 3 35 31 29 43 38 37 18
Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 1 92 131 183 91 132 186 approximately similar 2 90 131 185 93 133 184 3 35 31 29 43 38 37 19
Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 1 92 131 183 91 132 186 approximately similar 2 90 131 185 93 133 184 3 35 31 29 43 38 37 20
Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. Allows for 77% cache storage savings of approximable data! 1 92 131 183 91 132 186 approximately similar 2 90 131 185 93 133 184 3 35 31 29 43 38 37 21
Outline Approximate Computing Approximate Similarity Doppelgänger Cache Cache Architecture Similarity Mapping Evaluation 22
Doppelgänger Cache main memory shared last-level cache private caches processor core 23
Doppelgänger Cache main memory How can we exploit approximate similarity to save area and energy in the last-level cache? shared last-level cache private caches processor core 24
Doppelgänger Cache main memory shared last-level cache private caches processor core 25
Doppelgänger Cache main memory precise LLC shared LLC Doppelgänger LLC private caches processor core 26
Conventional Cache address from L2 data array tag array data from memory 27
Conventional Cache address from L2 data array tag array One-to-one mapping of data values to memory locations. data from memory 28
Conventional Cache address from L2 data array tag array One-to-one mapping of data values to memory locations. But the fundamental goal of a processor is to process data values, not memory locations… data from memory 29
Conventional Cache address from L2 data array tag array data from memory 30
Conventional Cache address from L2 data array tag array data from memory 31
Conventional Cache address from L2 data array tag array data from memory 32
Conventional Cache address from L2 data array tag array data from memory 33
Conventional Cache address from L2 data array tag array Multiple copies of approximately similar blocks. data from memory 34
Conventional Cache address from L2 data array tag array data from memory 35
Doppelgänger Cache address from L2 tag array approximate data array data from memory 36
Doppelgänger Cache address from L2 tag array Smaller data array allows for substantial area and energy savings. approximate data array data from memory 37
Doppelgänger Cache address from L2 tag array approximate data array data from memory 38
Doppelgänger Cache address from L2 tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X data from memory 39
Doppelgänger Cache - Lookups tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 40
Doppelgänger Cache - Lookups address 0 from L2 tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 41
Doppelgänger Cache - Lookups address 0 from L2 tag array map X from tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 42
Doppelgänger Cache - Lookups address 0 from L2 tag array map X from tag array tag 0 map X data A to L2 tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 43
Doppelgänger Cache - Insertions tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 44
Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X tag 5 tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 45
Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X data B to L2 tag 5 tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X data B from memory 46
Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X data B to L2 tag 5 map Y tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X data B from generate map Y from data B memory 47
Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X data B to L2 tag 5 map Y tag 1 map X approximate data array map X data block A tag 2 map X map Y tag 3 map X data B from generate map Y from data B memory 48
Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X data B to L2 Miss! tag 5 map Y tag 1 map X approximate data array map X data block A tag 2 map X map Y tag 3 map X data B from generate map Y from data B memory 49
Doppelgänger Cache - Insertions (Miss) address 5 from L2 tag array tag 0 map X data B to L2 tag 5 map Y tag 1 map X approximate data array map X data block A tag 2 map X map Y tag 3 map X data B from generate map Y from data B memory 50
Recommend
More recommend