OS Support for a Commodity Database on PC Clusters Distributed - PowerPoint PPT Presentation

� OS Support for a Commodity Database on PC Clusters Distributed Devices vs. Distributed File Systems Felix Rauch (National ICT Australia) Thomas M. Stricker (Google Inc., USA) Laboratory for Computer Systems, ETH Zurich, Switzerland NIC TA Member s NIC TA Pa r tner s

Commodity Solutions for OLAP Workloads TPC-D Customer Nation Region data model Supplier Part Order Database size: 10-100 GByte PartSupp LineItem What kind of system architectures are suitable for this type of workload? 2

Platforms More recently: Traditionally: Cluster of commodity PCs Symmetric Multi- processor (SMP) E.g. Patagonia multi-use cluster at ETH Zurich E.g. DEC 8400 3

Killer SMPs vs. Clusters of PCs P P P P P P P Processor C C C C C C C Caches M M M M Bus / Network Memory D D D D Disks M M D D Network Killer SMP Cluster of commodity PCs • Killer performance! • Killer price! • Killing price... • Killing performance? 4

Overview • Introduction • Motivation • Distributed storage architectures • Evaluation • Analysis of results • Alternative: Middleware • Conclusion 5

Research Goal Turn PC clusters into ''killer SMPs'' for OLAP. Combine excess storage and high-speed network already available on cluster nodes. Provide transparent distributed storage architecture as database's storage backend for OLAP applications. System architect's point of view. Focus on performance and understanding. 6

Storage Architectures for Clusters of PCs Traditional: • Big server with RAID • Storage-area networks (SAN) • Network-attached storage (NAS) → Additional hardware and costs Our proposed alternative: Use available commodity hardware and distribute data in software layers. 7

Why Should Such an Architecture Work? Commodity hardware and software (OS) allows high cost effectiveness. Trends: • Disks becoming larger and cheaper • Built-in high-speed network 8

Large Hard-Disk Drives 140 Disk size 120 (median) 100 Size [GByte] 80 Full OS size 60 (incl. applications) 40 20 0 1998 1999 2000 2001 2002 2003 2004 Year of survey 9

High-Speed Network 2000 1000 Fast Ethernet 10 Gigabit Ethernet Throughput [MByte/s] 100 Gigabit Ethernet 10 1 1995 2000 2005 Year Max. disk throughput Ethernet throughput → Enough bandwith to support distributed storage. 12

Our Scenario Parallel file systems for high- Distributed File System performance computing (network RAID0) Compute Compute Compute Compute DB node node node node node I/O I/O I/O I/O I/O I/O I/O node node node node node node node Boost DB performance Scalable (Lustre, PVFS) 13

Our Scenario Parallel file systems for high- Distributed File System performance computing (network RAID0) Compute Compute Compute Compute DB node node node node node I/O I/O I/O I/O I/O I/O I/O node node node node node node node Boost DB performance Scalable (Lustre, PVFS) 14

Alternative Systems • Petal [Lee & Thekkath, 1996]: Distributed virtual disks with special emphasis on dynamic reconfiguration and load balancing. • Frangipani [Thekkath, Mann & Lee, 1997]: Distributed file system that builds on Petal. • Lustre [Cluster File Systems, Inc.]: Object oriented file system for large clusters. 15

Investigated Architectures Fast Network Block Device ( FNBD ) • Maps hard-disk device over network • No intelligence, but highly optimised Parallel Virtual File System ( PVFS ) • Integrates nodes' disks into parallel FS • Fully-featured file system 16

Fast Network Block Device (FNBD) • Loosely based on Linux network block dev. • Implemented as kernel modules • Maps remote disk blocks over Gigabit Ethernet (from 3 servers) • Uses hardware features of commodity network interface to implement zero copy • Multiple instances into RAID0-like array of networked disks 17

Parallel Virtual File System (PVFS) • Widely used for PC clusters • Implemented as dynamically linked library • Fully featured distributed file system • Can be accessed by any participating node • Combines special directories on server nodes into large file system • 6 servers due to space limitations 18

Architecture of Reference Case Application Application(s) OS kernel File system Disk driver Single node Local disk access 19

Architecture of FNBD Application Application(s) Application(s) OS kernel OS kernel File system Distributed device driver Distributed device driver Disk driver Disk driver (server part) (client part) Server nodes Client node Fast Network Block Device 20

Architecture of PVFS Application Application(s) Application(s) PVFS server daemon PVFS library OS kernel OS kernel File system File system Disk driver Disk driver Server nodes Client node Parallel Virtual File System 21

A Stream-Based Analytic Model Presented at EuroPar 2000 conference. Considers flow of data stream and limits of building blocks. → Set of (in)equations. Solve to find maximal throughput of stream. Simple, works well for large data streams. 22

Modelling Workload Need to know performance characteristics of all involved building blocks. • Easy for small and simple parts (HW, OS functionality): Measurements or data sheets. • Very difficult for complex, closed software (RDBMS): Black-box. → Calibration model with know queries. 23

Calibration of Database Performance Two cases: • ''Simple'' case: Full table scan (find max.) • ''Complex'' case: Scan including CPU (sort) Experimental calibration with data in RAM: • 140 MByte/s throughput for simple case • 7.75 MByte/s throughput for complex case 24

Modelling OLAP on FNBD App. Pipe User space User space (reduced copy) RDBMS Copy OS kernel File system OS FNBD driver Special Special FNBD driver Disk driver kernel (server part) NIC driver NIC driver (client part) DMA DMA DMA Gigabit/s network Server side Client side 25

Modelling OLAP on PVFS App. Pipe User space User space PVFS Daemon (reduced copy) RDBMS PVFS library Copy Copy Copy OS kernel File system TCP/IP TCP/IP OS Special Special Disk driver kernel NIC driver NIC driver DMA DMA DMA Gigabit/s network Server side Client side 26

Evaluation Criteria Small microbenchmark ''speed'': • Throughput for large contiguous I/O operations with varying user-level block sizes. Application benchmark TPC-D: • Broad range of decision support applications, long-running, complex ad-hoc queries. • New TPC-H and TPC-R include updates. 27

Experimental Testbed Multi-use cluster with 16 nodes, each with: • Two 1-GHz PentiumIII CPUs • 512 MByte ECC SDRAM • 2 x 9 GByte disk space • 2 Gigabit Ethernet adapters • Linux kernel 2.4.3 28

Microbenchmarks 45 40 Throughput [MByte/s] 35 30 25 Reference case: 20 Single local disk 15 (1 disk) 10 5 0 4 32 256 User-level block size [KByte] Distributed devices FNBD Distributed file system PVFS (3 servers) (6 servers) 29

Experimental Evaluation with OLAP TPC-D decision support benchmark on ORACLE 1.2 1 Speedup over local disk 0.8 Reference case: Single local disk 0.6 (1 disk) 0.4 0.2 0 1 2 3 4 6 9 10 12 13 17 TPC-D query number Distr. devices FNBD Distr. file system PVFS (3 servers) (6 servers) 30

Experimental Evaluation with OLAP TPC-D decision support benchmark on ORACLE 1.2 Disk-limited query 1 Speedup over local disk 0.8 Reference case: Single local disk 0.6 (1 disk) 0.4 0.2 0 1 2 3 4 6 9 10 12 13 17 TPC-D query number Distr. devices FNBD Distr. file system PVFS (3 servers) (6 servers) 31

Quantitative Performance: Model vs. Measurements 1.6 1.4 Speedup over local disk 1.2 1 0.8 Reference case: 0.6 Single local disk 0.4 (1 disk) 0.2 0 Simple query Complex query TPC-D query 4 measured modelled Distr. devices FNBD Distr. file system PVFS 32

Analysis of Results Performance lower than expected. Aggregation of distributed disks did not increase application performance. Fully-featured distributed file system failed to deliver decent performance. Stream-based analytic model too simple for complex workload. 33

Alternative: Performance with TP-Lite Middleware Data distribution in middleware layer: TP-Lite by [Böhm et al, 2000] • Distributes queries to multiple database servers in parallel • Needs multiple servers (costs) • Small changes to application (not always possible) 34

Modelling OLAP with TP-Lite App. Pipe User space User space RDBMS (reduced copy) RDBMS Reduced copy Copy Reduced copy File system TCP/IP TCP/IP OS kernel OS Special Special Disk driver kernel NIC driver NIC driver DMA Reduced DMA Reduced DMA Gigabit/s network Server side Client side 35

OS Support for a Commodity Database on PC Clusters Distributed - PowerPoint PPT Presentation

OS Support for a Commodity Database on PC Clusters Distributed Devices vs. Distributed File Systems Felix Rauch (National ICT Australia) Thomas M. Stricker (Google Inc., USA) Laboratory for Computer Systems, ETH Zurich, Switzerland NIC

Asymmetries in Commodity Price Asymmetries in Commodity Price Behaviour Asymmetries in Commodity

I nternational research The evidence on clusters is clear Firms located in clusters are more

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

Introduc)on to Commodity Linux clusters To show all commodity

CS 5220: Parallel machines and models David Bindel 2017-09-07 1 Why clusters? Clusters of

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

CompleteCommodities offers comprehensive access to global commodity and commodity-equity

Scotiabank Commodity Commodity Market Research Price Indices October 25, 2005 Scotiabanks

Economics and Poverty Commodity Prices in Real Terms: Jute Commodity Prices in Real Terms:

Adding Low-Cost Hardware Barrier Support to Small Commodity Clusters Torsten Hfler Department

Locational narratives in creative clusters An exploration of place, reputation and creative

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

BENZINGA PRESENTATION VINZAN INTERNATIONAL INC APRIL 2019 VINZANS COMMODITY TRADING MODEL

Capacity and Commodity Invoices . Invoicing Discovery Day LDZ Capacity and Commodity Validations

Farm Service Agency Commodity Operations Patrick Dardis Kansas City Commodity Office Federal

GFS Doug Woos (based on slides from Tom Anderson and Dan Ports) Logistics notes Lab 3b due

Data-Intensive Distributed Computing 431/451/631/651 (Fall 2020) Part 2: MapReduce Algorithm

Poster: NDN Distributed File System (NDFS) Junior DONGO (UPEC) Fabrice MOURLIN (UPEC) Charif

XtreemFS a Distributed File System for Grids and Clouds Jan Stender Zuse Institute Berlin

Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer

BabuDB: Fast and Efficient File System Metadata Storage Jan Stender, Bjrn Kolbeck, Felix

Installation and Usage Yunhong Gu July 2010 Agenda System Overview Installation File

Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing Garhan Attebury