DFG PP 1307: Algorithm Engineering DFG Priority Program: nationwide funding program over 6 years for up to 30 individual projects PP 1307: Algorithm Engineering • 28 research projects • 267 publications • 17 software projects, e.g.: • Multi-Core STL (MCSTL) – now gcc parallel mode • STL for Extra Large Datasets (STXXL) 2 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Recap: Algorithm Engineering realistic models 1. hardware and problem “ The distance between theory design 2. and practice is closer in theory efficient, implementable algorithms than in practice ” analyze 3. beyond worst-case [Y. Matias (Google) in his invited talk at ESA ‘12] implement 4. with hardware peculiarities in mind experiment 5. repeatable, thorough interpretation 3 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Final Meeting (17.09.2014) 9 talks, covering wide range of topics ◦ route planning in road and public transport networks ◦ graph clustering and partitioning ◦ data compression ◦ linear and mixed integer optimization ◦ sequence analysis no Indico used, slides only partially available 4 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Summer School (18.-19.09.2014) Two days of lectures and hands-on sessions ◦ data compression (lecture only) ◦ linear and mixed integer optimization ◦ network analysis - graph clustering and partitioning ◦ shortest paths algorithms (lecture only) about 30 PhD students lots of discussion among students and lecturers 5 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Network Analysis Networks are everywhere ◦ Computer networks ◦ Social networks ◦ … 7 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Network Analysis Network analysis mainly concerned with complex networks ◦ Small diameter ◦ Varying degree distribution ◦ Lots of triangles 8 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Network Analysis GRAPH CLUSTERING GRAPH PARTITIONING ◦ Find (non-overlapping) internally dense, ◦ Partition vertex set into k (nearly) equally sized externally sparse subgraphs blocks ◦ Unknown: Number of subgraphs, their size ◦ Objective functions aim at small interfaces ◦ Goals / Applications: ◦ Applications: o Uncover community structure (analysis, ...) ◦ Numerical simulations ◦ route planning o Prepartition network (distributed storage, ...) ◦ distributed graph algorithms 9 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Network Analysis GRAPH CLUSTERING GRAPH PARTITIONING Algorithms: Algorithms: ◦ Label propagation algorithm ◦ Size-constrained label propagation ◦ Louvain greedy method ◦ Diffusion-based partitioning Many different metrics: ◦ Conductance ◦ Expansion ◦ Modularity ◦ … 10 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Network Analysis NetworKit: ◦ Toolkit developed during the project for network analysis – C++ with Python bindings ◦ Includes wide range of tools for graph analysis ◦ Excellent IPython notebook-based tutorial ◦ Includes algorithms proposed for evolving networks ◦ Analyze changing social networks – e.g. ITI email graph Interest for CERN: ◦ Community detection on the grid planning of file transfers ◦ Track reconstruction ongoing work 11 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Shortest Paths and Routing Problem: find shortest path between s and t in weighted graph G Algorithms: ◦ Dijkstra’s algorithm too slow for large graphs ◦ Manifold speedup techniques [survey] ◦ A ∗ : search with Euclidean bounds (classic) ◦ ALT: A ∗ search with landmarks, preprocessing computes distances to landmarks ◦ Contraction Hierarchies: introduce shortcuts between “important” vertices of the graph ◦ Hub Labeling: every vertex stores distance to several hubs, covering the graph ◦ Most techniques rely on (more or less) expensive pre-computations 12 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Shortest Paths and Routing Problem: User-defined cost functions render pre-computations futile Solution: Three-stage processing [Delling et al. 2013] 1. Metric-independent pre-processing ≈ hr Recursively partition graph Generate arcs between entry and exit nodes to neighboring partitions 2. Metric-dependent pre-processing ≈ s Compute metric between all shortcut arcs 3. Query ≈ μ s Find shortest-path in contracted graph and unpack it in original one 13 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Shortest Paths and Routing Routing in public transport networks is a much harder problem ◦ Inherent time-dependence ◦ Solved using (potentially huge!) event-activity networks Interest for CERN: ◦ Grid tiers already define contraction hierarchy examine actual data flows for missing/misplaced hubs 14 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Data Compression Problem: compress once, decompress many times Compressor Compressed Decompressi on dataset space (MB) on time MINGW (1gb) (secs) Requirements: Gzip 344 5.5 ◦ Compressed space Trade-off ◦ Decompression time Lzma 188 8.3 ◦ Compression time is not much an issue Snappy 461 0.9 “Snappy is widely used inside Google, in everything from BigTable and MapReduce …” 15 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Data Compression Reminder: Lempel-Ziv compression a a c a a c a b c a a d a a a a c <6,3> <0,d> <3,2> <11,3> This part has been already compressed Greedy approach only optimal if every pair takes constant space ◦ but variable number of bits required for distances non-optimal Bit-optimal LZ parsing [Ferragina et al. 2013] ◦ Solve shortest path problem on DAG describing possible compression pairs 16 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Data Compression Bi-criteria Compression [Farruggia et al. 2014] : ◦ Space and decompression time edge weight in DAQ ◦ Fix space constraint, search for lowest decompression time and vice versa 17 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Data Compression Different approach to compression: Burrows-Wheeler Transform [introduction] 18 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Data Compression Different approach to compression: Burrows-Wheeler Transform ◦ Yields smaller compression size but longer decompression time ◦ Construction of BWT closely related to suffix-array construction ◦ Allows decompression of any substring FM index [Ferragina and Manzini 2000] ◦ Used BWT and auxiliary data structures to answer count and locate queries on compressed text Interest for CERN: ◦ Compression of ROOT files + access of individual entries ◦ Compression of and search in dictionaries 19 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Miscellaneous Linear programming ◦ Disprove of Hirsch conjecture poses thread to simplex method still well in practice ◦ Anecdote: interior point method patented by AT&T circumvent patent by polar transformation of problem and usage of barrier method SeqAn ◦ Package for analysis of (genome) sequences ◦ Developers face similar problems as HEP: Bridge gap between computer science and real world problems External memory algorithms ◦ Flow computations for massive LiDAR terrain data sets ◦ General trick of time forward processing to reduce I/O 20 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
Conclusions ◦ Final meeting gave good overview of broad activity in DFG PP 1307 “Algorithm Engineering” ◦ Summer school expanded on four focus topics of the PP ◦ Similar research continues in DFG PP DFG 1736 “Algorithms for Big Data” ◦ Funding period 2013-2019 ◦ Currently 16 projects covering graph analysis, energy efficient scheduling, search and text indexing, genome assembly,… ◦ Most projects concerned with computer science problems ◦ Computational biology problems present in both PPs HEP community needs to explore how to exploit this resource of expertise and funding 21 2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING
More recommend