Communication-Efficient String Sorting Timo Bingmann, Peter Sanders, - PowerPoint PPT Presentation

Communication-Efficient String Sorting Timo Bingmann, Peter Sanders, Matthias Schimek · 2020-05-18 @ IPDPS’20 I NSTITUTE OF T HEORETICAL I NFORMATICS – A LGORITHMICS A n t i d i s e s t a b l i s h m e n t a r i a n i s m 0 s 0 F l o c c i n a u c i n i h i l i p i l i f i c a t i o n 0 s 1 H o n o r i f i c a b i l i t u d i n i t a t i b u s 0 s 2 Video and More Information: https://panthema.net/2020/0518-distributed-string-sorting/ www.kit.edu KIT – The Research University in the Helmholtz Association

Why String Sorting? string: array of characters over s t r i n g 0 alphabet Σ sorted string set: sorted lexicographically ⇒ like in a dictionary characteristics of string sets #strings n , #characters N s 0 a l g o r i t h m 0 s 1 c o m p a r e 0 sum distinguishing s 2 c o m p a r i s o n 0 prefix lengths D s 3 p r e f i x 0 ⇒ multidimensional data only published distributed string sorting algorithm: one paragraph in [Fischer and Kurpicz, ALENEX’19] Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting 2 / 10 Institute of Theoretical Informatics – Algorithmics May 18th, 2020

String Sorting Toolbox Sequential Sorting: String Radix Sort, Multikey Quicksort, . . . [Kärkkäinen et al., SPIRE’08], [Bentley and Sedgewick, SODA’97] evaluation of many sequential a l g o r i t h m 0 ⊥ algorithms in [Bingmann ’18] 2 a l p h a 0 5 a l p h a b e t 0 needed: string sorting c h a r a c t e r 0 0 c o m p l e t e 1 0 + Longest Common Prefix 4 c o m p u t e r 0 (LCP) array computation c o m p u t i n g 0 6 c o p y 0 2 Multiway Merging: LCP Losertree [Bingmann et. al, Algorithmica’17] exploit LCP values to ( 2 , aab ) save character-comparisons ( 1 , acb ) LCP- ( 2 , aac ) ( 0 , bca ) Merge ( 2 , aab ) ( 2 , aac ) ( 0 , bca ) ( 1 , acb ) Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting 3 / 10 Institute of Theoretical Informatics – Algorithmics May 18th, 2020

String Sorting Toolbox LCP Compression ⊥ a l g o r i t h m 0 ⊥ a l g o r i t h m 0 2 a l p h a 0 2 p h a 0 a l p h a b e t 0 b e t 0 5 5 compress c h a r a c t e r 0 c h a r a c t e r 0 0 0 ⇒ c o m p l e t e 0 o m p l e t e 0 1 1 c o m p u t e r u t e r 4 0 4 0 c o m p u t i n g i n g 6 0 6 0 c o p y 0 p y 0 2 2 each longest common prefix is sent only once compression: iterate over strings + LCP array decompression: iterate over compressed strings + LCP array Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting 4 / 10 Institute of Theoretical Informatics – Algorithmics May 18th, 2020

Distributed Merge String Sort (MS) Local Sorting local sorting local sorting local sorting String Radix Sort new: String Radix Sort + LCP array Distributed Partitioning Algorithm String Exchange no compression new: LCP compression String Exchange Merging y y plain losertree merging merging merging new: LCP losertree Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting 5 / 10 Institute of Theoretical Informatics – Algorithmics May 18th, 2020

Distributed Merge String Sort (MS) Partitioning equidistant sampling regular sampling regular sampling regular sampling sample sets gather + seq. sort new: hypercube quicksort Sorting of Sample Sets + [Axtmann and Sanders, ALENEX’17] Final Splitter Selection broadcast final p − 1 final splitters splitters partitioning partitioning partitioning partitioning Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting 6 / 10 Institute of Theoretical Informatics – Algorithmics May 18th, 2020

Prefix Doubling String Merge Sort (PDMS) PE1: A n t i d i s e s t a b l i s h m e n t a r i a n i s m 0 F l o c c i n a u c i n i h i l i p i l i f i c a t i o n 0 PE2: PE3: H o n o r i f i c a b i l i t u d i n i t a t i b u s 0 same main structure as before use distributed Single-Shot Bloom Filter (dSBF) [Sanders et al., IEEE BigData’13] to approximate distinguishing prefixes with distributed duplicate detection only operate on those characters calculate only the permutation for sorting (exchanging further characters is optional). Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting 7 / 10 Institute of Theoretical Informatics – Algorithmics May 18th, 2020

Experimental Evaluation – Setup Input Data D / N -Generator ( n =9, ℓ =6, D / N =0.5) weak scaling with D / N -Generator a a a a a 0 s 0 a a b a a 0 s 1 Hardware (ForHLR I at KIT) s 2 a a c a a 0 2 Deca-core Intel Xeon a b a a a 0 s 3 E5-2670 v2 (2.5 GHz) and a b b a a 0 s 4 64 GB RAM per compute node a b c a a 0 s 5 a c a a a 0 s 6 InfiniBand 4X FDR interconnect s 7 a c b a a 0 s 8 a c c a a 0 Algorithms FKmerge: from Fischer and Kurpicz [ALENEX’19] hQuick: distributed quicksort our merge sort: MS-simple (no LCP-comp), MS (LCP-comp) our prefix doubling merge sort: PDMS-Golomb, PDMS

D / N -Generator( n = p · 500K, ℓ =500, D / N =?) 0.0 0.25 0.5 0.75 1.0 15 time (s) 10 5 0 bytes sent per string 600 400 200 0 20 40 80 160 320 640 1 , 280 20 40 80 160 320 640 1 , 280 20 40 80 160 320 640 1 , 280 20 40 80 160 320 640 1 , 280 20 40 80 160 320 640 1 , 280 # of PEs # of PEs # of PEs # of PEs # of PEs FKmerge hQuick MS-simple MS PDMS-Golomb PDMS

Conclusion Summary two new communication-efficient string sorting algorithms: distributed string merge sort (MS) distributed prefix-doubling string merge sort (PDMS) theory and experimental evaluation different strategies best for low and high D / N -ratios Source code and recording of talk: https://panthema.net/2020/0518-distributed-string-sorting Future Work improve balancing by considering strings and characters can one show lower bounds? Questions via email to bingmann@kit.edu

Communication-Efficient String Sorting Timo Bingmann, Peter Sanders, - PowerPoint PPT Presentation

Communication-Efficient String Sorting Timo Bingmann, Peter Sanders, Matthias Schimek 2020-05-18 @ IPDPS20 I NSTITUTE OF T HEORETICAL I NFORMATICS A LGORITHMICS A n t i d i s e s t a b l i s h m e n t a r i a n i

The String Class Trace Code Constructing a String String s = "Java"; String

SORTING Review of Sorting Merge Sort Sets sorting 1 Sorting Algorithms

Overview/Questions What is sorting? Why does sorting matter? How is sorting

Sorting Lower Bound Sorting Lower Bound 1 Comparison-Based Sorting (10.4) Many sorting

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Sorting Sorting used as a step in many algorithms Savitch Chapter 7.4 Sorting algorithms

Sorting Sorting as a tool Sorting problem: Given a list a with n elements possessing a There are

Sorting Sorting: to arrange data in some sequential order Sorting occurs as a part in

Chapter 7 External Sorting Sorting Tables Larger Than Main Memory Query Processing Sorting

Sorting Algorithms Introduction Sorting Problem Sorting Problem Given a sequence A = a 1 , .

Cache and TLB-aware Parallel Sorting Kynan Shook Sorting Sorting is used in many places

String Objectives Discuss string handling System.String class

A Sorted Partitioning Approach to High- speed and Fast-update OpenFlow Classification Sorrachai

CSE326:DataStructures Lecture#16 SortingThingsOut Bart Niswonger

Specialization Oriented Programming Jim Newton, VCAD, Cadence Design Systems 29th July 2007 Jim

CSCI-2500: Computer Organization Boolean Logic & Arithmetic for Computers (Chapter 3 and

CS61A Lecture 16 Amir Kamil UC Berkeley February 27, 2013 Announcements HW5 due tonight

Review:'Design'Pa.erns'are'NOT' ! Designs'that'can'be'encoded'in'classes'and' CS 619 Introduction

Functional Programming 1 / 13 Functional Features in Python Functions are first class, meaning

Iterators: Patterns and STL Access a container without knowing how its implemented

Communication-Efficient String Sorting Timo Bingmann, Peter Sanders, - PowerPoint PPT Presentation

Communication-Efficient String Sorting Timo Bingmann, Peter Sanders, Matthias Schimek 2020-05-18 @ IPDPS20 I NSTITUTE OF T HEORETICAL I NFORMATICS A LGORITHMICS A n t i d i s e s t a b l i s h m e n t a r i a n i

The String Class Trace Code Constructing a String String s = &quot;Java&quot;; String

SORTING Review of Sorting Merge Sort Sets sorting 1 Sorting Algorithms

Overview/Questions What is sorting? Why does sorting matter? How is sorting

Sorting Lower Bound Sorting Lower Bound 1 Comparison-Based Sorting (10.4) Many sorting

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Sorting Sorting used as a step in many algorithms Savitch Chapter 7.4 Sorting algorithms

Sorting Sorting as a tool Sorting problem: Given a list a with n elements possessing a There are

Sorting Sorting: to arrange data in some sequential order Sorting occurs as a part in

Chapter 7 External Sorting Sorting Tables Larger Than Main Memory Query Processing Sorting

Sorting Algorithms Introduction Sorting Problem Sorting Problem Given a sequence A = a 1 , .

Cache and TLB-aware Parallel Sorting Kynan Shook Sorting Sorting is used in many places

String Objectives Discuss string handling System.String class

A Sorted Partitioning Approach to High- speed and Fast-update OpenFlow Classification Sorrachai

CSE326:DataStructures Lecture#16 SortingThingsOut Bart Niswonger

Specialization Oriented Programming Jim Newton, VCAD, Cadence Design Systems 29th July 2007 Jim

CSCI-2500: Computer Organization Boolean Logic &amp; Arithmetic for Computers (Chapter 3 and

CS61A Lecture 16 Amir Kamil UC Berkeley February 27, 2013 Announcements HW5 due tonight

Review:'Design'Pa.erns'are'NOT' ! Designs'that'can'be'encoded'in'classes'and' CS 619 Introduction

Functional Programming 1 / 13 Functional Features in Python Functions are first class, meaning

Iterators: Patterns and STL Access a container without knowing how its implemented

The String Class Trace Code Constructing a String String s = "Java"; String

CSCI-2500: Computer Organization Boolean Logic & Arithmetic for Computers (Chapter 3 and