Coping with the Memory Hierarchy the Cache-Oblivious Way Rolf Fagerberg University of Aarhus Imada, SDU, February 18, 2004
Overview • The memory hierachy • The I/O-model • The cache-oblivious model • Examples of cache-oblivious algorithms • Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 2
The Memory Hierarchy Modern computers: RAM Cache3 Cache2 Disk Reg. Cache1 CPU Tertiary Storage Fagerberg: The Cache-Oblivious Way 3
The Memory Hierarchy Modern computers: RAM Cache3 Cache2 Disk Reg. Cache1 CPU Tertiary Storage Access time Volume Registers 1 cycle 1 Kb Cache 10 cycles 512 Kb RAM 100 cycles 512 Mb Disk 20,000,000 cycles 80 Gb Fagerberg: The Cache-Oblivious Way 3
The Memory Hierarchy Modern computers: RAM Cache3 Cache2 Disk Reg. Cache1 CPU Tertiary Storage Gap increases over time. Access time Volume Real problems of Gigabyte, Terabyte, Registers 1 cycle 1 Kb and even Petabyte size: Databases Cache 10 cycles 512 Kb (finance, phone companies, banks, RAM 100 cycles 512 Mb weather, geology, geography, astron- Disk 20,000,000 cycles 80 Gb omy), WWW, GIS systems, computer graphics. Fagerberg: The Cache-Oblivious Way 3
Classic RAM Model Add: O (1) R The RAM model: CPU A O (1) Branch: M Mem access: O (1) Fagerberg: The Cache-Oblivious Way 4
Classic RAM Model Add: O (1) R The RAM model: CPU A O (1) Branch: M Mem access: O (1) Increasingly inadequate Fagerberg: The Cache-Oblivious Way 4
Overview √ The memory hierachy • The I/O-model • The cache-oblivious model • Examples of cache-oblivious algorithms • Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 5
I/O Model I/O Model two layers M e m → External CPU o r Memory y N = problem size M = memory size Aggarwal and Vitter 1988 B = I/O block size Cost: number of I/Os. Fagerberg: The Cache-Oblivious Way 6
Example CPU time Inplace Worstcase √ √ Heapsort N log N √ N log N Quicksort √ Mergesort N log N Fagerberg: The Cache-Oblivious Way 7
Example CPU time Inplace Worstcase I/O √ √ Heapsort N log N N log N √ N log N ( N log N ) /B Quicksort √ Mergesort N log N ( N log N ) /B Random memory access ⇒ page fault at every access. Sequential memory access ⇒ page fault every B accesses. Typically, B ∼ 10 3 Fagerberg: The Cache-Oblivious Way 7
I/O-Optimal Sorting N Binary Mergesort: B log 2 N I/Os Multi-Way Merging: Maximal merge degree ≈ M/B N N Multi-Way Mergesort: B log M/B M I/Os Fagerberg: The Cache-Oblivious Way 8
I/O Model Facts • Scanning: Θ( N/B ) I/Os. • Searching: Θ(log B N ) I/Os by B -trees. � � N N I/Os by M • Sorting: Θ B log M/B B -way merge-sort. M � � min { N, N N • Permuting: Θ B log M/B M } by direct move or sorting 1988-2004: Many algorithms and data structures for problems from computational geometry, graphs, strings, . . . Fagerberg: The Cache-Oblivious Way 9
Overview √ The memory hierachy √ The I/O-model • The cache-oblivious model • Examples of cache-oblivious algorithms • Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 10
Computer Models Reality: L1 L2 R C C CPU a a A Disk c c M h h e e Increasing access time Models: I/O Cache- M c B e R a m CPU A CPU c Oblivious- o M h r e y ness M Multi-level RAM model I/O model New Model models Fagerberg: The Cache-Oblivious Way 11
Cache-Oblivious Model • Program in the RAM model I/O • Analyze in the I/O model for M c B e a m arbitrary B and M c CPU o h r e y • Optimal off-line cache replacement strategy M Frigo, Leiserson, Prokop, Ramachandran, FOCS’99 Fagerberg: The Cache-Oblivious Way 12
Cache-Oblivious Model • Program in the RAM model I/O • Analyze in the I/O model for M c B e a m arbitrary B and M c CPU o h r e y • Optimal off-line cache replacement strategy M Frigo, Leiserson, Prokop, Ramachandran, FOCS’99 Advantages: • Optimal on arbitrary level ⇒ optimal on all levels • Portability • Simplicity of model. L1 L2 R C C CPU a a A Disk c c M h h e e Increasing access time Fagerberg: The Cache-Oblivious Way 12
Cache-Oblivious Results Scanning ⇒ stack, queue, selection,. . . . Fagerberg: The Cache-Oblivious Way 13
Cache-Oblivious Results Scanning ⇒ stack, queue, selection,. . . . Matrix multiplication, FFT: FOCS’99 Sorting: FOCS’99, ICALP’02, ALENEX’04 Search trees: Prokop 99, FOCS’00, WAE’01, SODA’02 × 2, ESA’02, FOCS’03 Priority queues: STOC’02, ISAAC’02 Graph algorithms: STOC’02, BRICS-04-2 Computational geometry: 2 × ICALP’02 , SCG’03 Scanning dynamic sets: ESA’02 Power of cache-obliviousness: STOC’03 Fagerberg: The Cache-Oblivious Way 13
Cache-Oblivious Results Scanning ⇒ stack, queue, selection,. . . . Matrix multiplication, FFT: FOCS’99 Sorting: FOCS’99, ICALP’02, ALENEX’04 Search trees: Prokop 99, FOCS’00, WAE’01, SODA’02 × 2, ESA’02, FOCS’03 Priority queues: STOC’02, ISAAC’02 Graph algorithms: STOC’02, BRICS-04-2 Computational geometry: 2 × ICALP’02 , SCG’03 Scanning dynamic sets: ESA’02 Power of cache-obliviousness: STOC’03 Fagerberg: The Cache-Oblivious Way 13
Overview √ The memory hierachy √ The I/O-model √ The cache-oblivious model • Examples of cache-oblivious algorithms • Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 14
✆ ✌ ✁ ✂ ✄ ✌ ☎ ✎ ✠ ☞ ✡ ✞ ✌ ☛ ☛ ☞ � � ☛ ✍ ✄ ☞ ✏ ✌ ✎ � ✁ ✂ ☎ ☛ ✆ ✓ ☎ ✠ ✡ ✞ ☎ ✄ Double for-loop i X X , Y arrays of length n : Y j ✝✟✞ ✝✟✞ ✏✒✑ I/O complexity: B = n 2 n × n B Fagerberg: The Cache-Oblivious Way 15
✌ ☛ ✂ ✄ ✌ ✆ ✝ ✞ ✁ ✠ ✡ ✞ ✌ ☛ ☞ � � ✁ ✂ ✄ ✁ ✆ ☎ ✞ ✁ ✠ ☎ ☛ ✁ ☞ ✞ � ☞ ✏ ✌ ✎ ✓ ☎ ✎ ✍ ✄ � ☞ ☛ ✁ � ✂ ✄ ☎ ✆ ☛ ☎ ✠ ✡ ✞ ☎ ✆ ☎ ☛ � Double for-loop M X More efficient version in the I/O-model: Y M I/O complexity: n 2 M × n n M × M B = MB ✝✟✞ ✏✒✑ Fagerberg: The Cache-Oblivious Way 16
Double for-loop Cache-oblivious version: n/ 2 n/ 2 X + recursion Y n/ 2 n/ 2 I/O complexity: n 2 Again MB Fagerberg: The Cache-Oblivious Way 17
✡ ✞ ✁ ✝ ✄ ☎ ☛ ✄ ☎ ✡ ✟ ✆ ✠ ☞ ✌ ✑ ✌ ✑ ✄ ☎ ✁ ☎ ✞ ✡ ✞ ✟ ✠ ☞ ✌ ✑ ✄ ☎ ✞ ✄ ✟ ✠ ☞ ✌ ☞ � ✁ ✁ ✂ ✡ ✟ ☎ ✟ ✌ ✑ ✌ ☛ ✄ ☎ ✡ ✞ ✠ ✠ ☞ ✌ ✑ ✄ ☎ ✡ ✞ ✟ ☞ ✟ ✠ ☎ ☞ ✌ ☞ � ✁ ✁ ✂ ✄ ✆ ✞ ✁ ✁ ✝ ✄ ☎ ☛ ✄ ☎ ✡ ✌ ✄ ☞ ✆ ✄ ✄ ☎ ✡ ✞ ✟ ✠ ✆ ✡ ☎ ☞ � ✄ ✍ ✎ ☎ ☞ ✓ � ☞ ✌ ✝ � ✁ ✁ ✂✄ ☎ ✆ ✁ ✁ ✄ ✠ ☎ ✑ ✌ ✑ ✄ ☎ ✡ ✞ ✟ ✎ ✏ ☛ ✂ ✟ ✠ ☞ ✌ ☞ � ✁ ✁ ✄ ✡ ☎ ✆ ✁ ✁ ✝ ✄ ☎ ✑ ✌ ✞ ☎ ☞ ✄ ☎ ✄ ☛ ☎ � ✁ ✁ ✂ ☎ ✄ ✆ ✁ ✁ ✝ ✄ ☎ ✑ ✌ ✑ ✠ 18 Double for-loop Cache-oblivious version Fagerberg: The Cache-Oblivious Way ✏✒✑
Experiments 10000 time (seconds) 1000 100 10 plain cache-aware (L1) cache-aware (L2) log 2 of array size (bytes) cache-oblivious 1 15 16 17 18 19 20 21 Sizes within RAM (element size 4 bytes) 366 MHz Pentium II, 128 MB RAM, 256 KB Cache, gcc -O3, Linux Fagerberg: The Cache-Oblivious Way 19
Experiments time (seconds) 1000 100 10 1 plain cache-aware (L2) cache-aware (RAM) log 2 of array size (bytes) 0.1 cache-oblivious 19 20 21 22 23 24 25 26 27 Sizes exceeding RAM (element size 1 KB) 366 MHz Pentium II, 128 MB RAM, 256 KB Cache, gcc -O3, Linux Fagerberg: The Cache-Oblivious Way 20
For-loop Applications Join in databases Dynamic programming (bioinformatics) Matrix multiplication (scientific computing) Fagerberg: The Cache-Oblivious Way 21
Overview √ The memory hierachy √ The I/O-model √ The cache-oblivious model • Examples of cache-oblivious algorithms √ Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 22
Static Cache-Oblivious Trees Recursive memory layout (van Emde Boas layout) Prokop 1999 · · · ⌊ h/ 2 ⌋ A · · · · · · · · · h · · · ⌈ h/ 2 ⌉ · · · · · · · · · B 1 Bk · · · · · · · · · · · · · · · · · · A B 1 · · · Bk Binary tree Searches use O(log B N ) I/Os Fagerberg: The Cache-Oblivious Way 23
Recommend
More recommend