BY THEIR FRUITS SHALL YE KNOW THEM A DATA ANALYSTS PERSPECTIVE ON - PowerPoint PPT Presentation

BY THEIR FRUITS SHALL YE KNOW THEM A DATA ANALYST’S PERSPECTIVE ON MASSIVELY PARALLEL SYSTEM DESIGN Holger Pirk Sam Madden Mike Stonebraker

A CRUCIAL DISTINCTION ≠

INSPIRATION

MY PLEDGE OF LOYALTY

SCIENTIFIC RATIONALE

GENE AMDAHL TAUGHT US THAT SYSTEMS NEED TO BE BALANCED 1E Processed Instructions per Second 1P 1T Processing 50 0 GB /s 1 10 100 Processed Bytes per Instruction

NVIDIA AND AMD PROCESS LOT OF SMALL DATA WORDS 1E Processed Instructions per Second 1P AMD Nvidia 1T Processing 50 0 GB /s 1 10 100 Processed Bytes per Instruction

Instruction Scheduler SIMT Cores SIMT Memory

INTEL PROCESSES FEWER LARGE DATAWORDS 1E Processed Instructions per Second 1P AMD Nvidia Intel 1T Processing 50 0 GB /s 1 10 100 Processed Bytes per Instruction

MANY -CORE SIMD SIMD Core SIMD Core SIMD Core SIMD Core Pentium Cores SIMD Core SIMD Core SIMD Core SIMD Core SIMD Core SIMD Core 512 Bits Memory

SIMD WITH SCATTER/GATHER SIMD Core SIMD Core SIMD Core SIMD Core SIMD Core SIMD Core Scatter/Gather SIMD Core Unit Memory

ALL OF THEM CAN PROCESS WAY MORE DATA THAN THEY CAN LOAD 1E Processed Instructions per Second 1P AMD Nvidia Intel 1T Processing 50 0 GB /s 1 10 100 Processed Bytes per Instruction

SPEC BANDWIDTH-WISE, PHI OUTPERFORMS CURRENT GPUS 400 300 GB/s Memory Bandwidth 200 100 0 Phi GTX 780

OUR QUESTION: DOES IT MATTER? DOES PHI CHANGE ANYTHING? 1E Processed Instructions per Second 1P AMD Nvidia Intel 1T Processing 50 0 GB /s 1 10 100 Processed Bytes per Instruction

THE OBSTACLE COURSE

DATA-CENTRIC APPLICATIONS HAVE TYPICAL CHOKEPOINTS Ɣ Synchronization Computation Bandwidth Capacity π Facts Dimension

DATA-CENTRIC APPLICATIONS HAVE TYPICAL CHOKEPOINTS Ɣ Hash Complexity # of Conflicts Tuple Width Access Locality π Facts Dimension

PHI VS. GTX 780

FIRST CHOKEPOINT Ɣ Bandwidth π Facts Dimension

BANDWIDTH OF PHI LOOKS SIMILAR TO GPU AT FIRST GLANCE GTX 780 Xeon Phi 1.28 Time per Access in ns 0.64 0.32 0.16 0.08 0.04 4 8 16 32 64 128 256 512 Stride in Bytes

A SECOND GLANCE REVEALS SOMETHING ODD… GTX 780 Xeon Phi 1.28 Time per Access in ns 0.64 0.32 A Non-Linear Cost Function 0.16 0.08 0.04 4 8 16 32 64 128 256 512 Stride in Bytes

A SECOND GLANCE REVEALS SOMETHING ODD… GTX 780 Xeon Phi 1.28 Time per Access in ns 0.64 0.32 Not Dominated (only) by Cache Misses 0.16 0.08 0.04 4 8 16 32 64 128 256 512 Stride in Bytes

SECOND CHOKEPOINT Ɣ Capacity π Facts Dimension

PHI BENEFITS FROM LARGER CACHES GTX 780 Xeon Phi Xeon Phi Lower Bound GTX 780 Lower Bound 1.28 Time per Access in ns 0.64 0.32 0.16 0.08 0.04 0.02 64 512 4K 32K 256K 2M 16M Size of Lookup Table in Bytes

THIRD CHOKEPOINT Ɣ Computation π Facts Dimension

COMPUTATION PERFORMANCE IS VERY SIMILAR… Xeon Phi GTX 780 0.80 Time per hash in ns 0.40 0.20 0.10 0.05 1 2 4 8 16 32 Number of Murmur Rehashes

THIRD CHOKEPOINT Ɣ Synchronization π Facts Dimension

…AND SO IS HASH-BUILDING GTX 780 Xeon Phi 15.0 Time per Access in ns 10.0 5.0 0.0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Number of Values per Bucket

RECAP • Phi & GPU mostly en par in • Computation • Synchronization • Cache-Utilization • But what is up with the memory access

PHI IN DEPTH

SCATTER/GATHER

�� LET’S LOOK AT THE DOCUMENTATION CHAPTER 6. INSTRUCTION DESCRIPTIONS VGATHERDPD - Gather Float64 Vector With Signed Dword Indices Opcode Instruction Description MVEX.512.66.0F38.W1 92 vgatherdpd zmm1 {k1}, Gather �loat64 vector U f 64 ( mv t ) into �loat64 /r /vsib U f 64 ( mv t ) vector zmm1 using doubleword indices and k1 as completion mask. Description A set of 8 memory locations pointed by base address BASE _ ADDR and doubleword index vector V INDEX with scale SCALE are converted to a �loat64 vector. The result is written into �loat64 vector zmm1. Note the special mask behavior as only a subset of the active elements of write mask k1 are actually operated on (as denoted by function SELECT _ SUBSET ). There are only two guarantees about the function: (a) the destination mask is a subset of the source mask (identity is included), and (b) on a given invocation of the instruction, at least one element (the least signi�icant enabled mask bit) will be selected from the source mask. Programmers should always enforce the execution of a gather/scatter instruction to be re-executed (via a loop) until the full completion of the sequence (i.e. all elements of the gather/scatter sequence have been loaded/stored and hence, the write-mask bits all are zero). Note that accessed element by will always access 64 bytes of memory. The memory region accessed by each element will always be between elemen_linear_address & ( ∼ 0x3F) and (element_linear_address & ( ∼ 0x3F)) + 63 boundaries. This instruction has special disp8*N and alignment rules. N is considered to be the size of a single vector element before up-conversion. Note also the special mask behavior as the corresponding bits in write mask k1 are reset with each destination element being updated according to the subset of write mask k1. This is useful to allow conditional re-trigger of the instruction until all the elements from a given write mask have been successfully loaded. The instruction will #GP fault if the destination vector zmm1 is the same as index vector V INDEX . Operation �� mv t �� 297 Reference Number: 327364-001

�� LET’S LOOK AT THE DOCUMENTATION CHAPTER 6. INSTRUCTION DESCRIPTIONS VGATHERDPD - Gather Float64 Vector With Signed Dword Indices Opcode Instruction Description MVEX.512.66.0F38.W1 92 vgatherdpd zmm1 {k1}, Gather �loat64 vector U f 64 ( mv t ) into �loat64 /r /vsib U f 64 ( mv t ) vector zmm1 using doubleword indices and k1 as completion mask. ??? Description A set of 8 memory locations pointed by base address _ and doubleword index vector with scale are converted to a �loat64 vector. The result is written into �loat64 vector zmm1. Note the special mask behavior as only a subset of the active elements of write mask k1 are actually operated on (as denoted by function _ ). There are only two guarantees about the function: (a) the destination mask is a subset of the source mask (identity is included), and (b) on a given invocation of the instruction, at least one element (the least signi�icant enabled mask bit) will be selected from the source mask. Programmers should always enforce the execution of a gather/scatter instruction to be re-executed (via a loop) until the full completion of the sequence (i.e. all elements of the gather/scatter sequence have been loaded/stored and hence, the write-mask bits all are zero). Note that accessed element by will always access 64 bytes of memory. The memory region accessed by each element will always be between elemen_linear_address & ( 0x3F) and (element_linear_address & ( 0x3F)) + 63 boundaries. This instruction has special disp8*N and alignment rules. N is considered to be the size of a single vector element before up-conversion. Note also the special mask behavior as the corresponding bits in write mask k1 are reset with each destination element being updated according to the subset of write mask k1. This is useful to allow conditional re-trigger of the instruction until all the elements from a given write mask have been successfully loaded. The instruction will #GP fault if the destination vector zmm1 is the same as index vector . Operation 297 Reference Number: 327364-001

BY THEIR FRUITS SHALL YE KNOW THEM A DATA ANALYSTS PERSPECTIVE ON - PowerPoint PPT Presentation

BY THEIR FRUITS SHALL YE KNOW THEM A DATA ANALYSTS PERSPECTIVE ON MASSIVELY PARALLEL SYSTEM DESIGN Holger Pirk Sam Madden Mike Stonebraker A CRUCIAL DISTINCTION INSPIRATION MY PLEDGE OF LOYALTY SCIENTIFIC RATIONALE GENE AMDAHL

Market Trends and Market Trends and Export of Thai Fruits Export of Thai Fruits Narong

Tropical Fruits Tropical Fruits See Home page under References and Nuts and Nuts The

Global Demand and Supply for Tropical Global Demand and Supply for Tropical Fruits Fruits

Numbers 24:17 I shall see him, but not now: I shall behold him, but not nigh: there shall come a

Know how. Know now. Know how. Know now. Please Thank our sponsor! The Nebraska Soybean Board

What You Dont Know What You Dont Know What You Dont Know What You Dont Know That

HAG A NEW CYCLE OF GROWTH 1 BUSINESS OPERATION Business Segmentation Industrial Fruits

Organic Farm ing Organic Farm ing in Tropical Fruits in Tropical Fruits : Potential & :

Objective The main purpose of preservation of fruits is to protect Different methods of

Distribution channel of Distribution channel of Tropical Fruits in Thailand Tropical Fruits in

Marketing of Fruits in India - Marketing of Fruits in India - Present Practice and Future needs

Lay Them Down Chorus: Lay them down, Lay them down, Lay your branches down for Him Spread them

Leviticus 23:41-43 And ye shall keep it a Feast unto the LORD seven days in the year. It shall be

Gen 49:8, Judah, you are he whom your brothers shall praise; Your hand shall be on the neck of

8/23/2016 Grace After We're "Home" Stanza 5 Yes, when this flesh and heart shall fail

1 THE FEAST Leviticus 23:41-43 And ye shall keep it a Feast unto the LORD seven days in the year.

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

The repetition threshold for binary rich words Lucas Mol Joint work with James D. Currie and

Avoiding Three Consecutive Blocks of the Same Length and Sum Julien Cassaigne 1 , James D. Currie

monolith to micro-services? event sourcing can help Doug Client legacy Culture Amp

Designing and building a distributed data store in Go 3 February 2018 Matt Bostock Who am I?

Graph Processing with Apache Tinkerpop on Apache S2Graph(incubating) TABLE OF CONTENTS -

3.1 Architecture 3 Systems Alexander Smola Introduction to Machine Learning 10-701

Wunderlist The only way to organize your life and work Saturday, October 5, 13 Hey, how have

BY THEIR FRUITS SHALL YE KNOW THEM A DATA ANALYSTS PERSPECTIVE ON - PowerPoint PPT Presentation

BY THEIR FRUITS SHALL YE KNOW THEM A DATA ANALYSTS PERSPECTIVE ON MASSIVELY PARALLEL SYSTEM DESIGN Holger Pirk Sam Madden Mike Stonebraker A CRUCIAL DISTINCTION INSPIRATION MY PLEDGE OF LOYALTY SCIENTIFIC RATIONALE GENE AMDAHL

Market Trends and Market Trends and Export of Thai Fruits Export of Thai Fruits Narong

Tropical Fruits Tropical Fruits See Home page under References and Nuts and Nuts The

Global Demand and Supply for Tropical Global Demand and Supply for Tropical Fruits Fruits

Numbers 24:17 I shall see him, but not now: I shall behold him, but not nigh: there shall come a

Know how. Know now. Know how. Know now. Please Thank our sponsor! The Nebraska Soybean Board

What You Dont Know What You Dont Know What You Dont Know What You Dont Know That

HAG A NEW CYCLE OF GROWTH 1 BUSINESS OPERATION Business Segmentation Industrial Fruits

Organic Farm ing Organic Farm ing in Tropical Fruits in Tropical Fruits : Potential &amp; :

Objective The main purpose of preservation of fruits is to protect Different methods of

Distribution channel of Distribution channel of Tropical Fruits in Thailand Tropical Fruits in

Marketing of Fruits in India - Marketing of Fruits in India - Present Practice and Future needs

Lay Them Down Chorus: Lay them down, Lay them down, Lay your branches down for Him Spread them

Leviticus 23:41-43 And ye shall keep it a Feast unto the LORD seven days in the year. It shall be

Gen 49:8, Judah, you are he whom your brothers shall praise; Your hand shall be on the neck of

8/23/2016 Grace After We're &quot;Home&quot; Stanza 5 Yes, when this flesh and heart shall fail

1 THE FEAST Leviticus 23:41-43 And ye shall keep it a Feast unto the LORD seven days in the year.

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

The repetition threshold for binary rich words Lucas Mol Joint work with James D. Currie and

Avoiding Three Consecutive Blocks of the Same Length and Sum Julien Cassaigne 1 , James D. Currie

monolith to micro-services? event sourcing can help Doug Client legacy Culture Amp

Designing and building a distributed data store in Go 3 February 2018 Matt Bostock Who am I?

Graph Processing with Apache Tinkerpop on Apache S2Graph(incubating) TABLE OF CONTENTS -

3.1 Architecture 3 Systems Alexander Smola Introduction to Machine Learning 10-701

Wunderlist The only way to organize your life and work Saturday, October 5, 13 Hey, how have

Organic Farm ing Organic Farm ing in Tropical Fruits in Tropical Fruits : Potential & :

8/23/2016 Grace After We're "Home" Stanza 5 Yes, when this flesh and heart shall fail