Sector/Sphere Tutorial Yunhong Gu CloudCom 2010, Nov. 30, - PowerPoint PPT Presentation

Sector/Sphere Tutorial Yunhong Gu CloudCom 2010, Nov. 30, Indianapolis, IN

Outline Outline  Introduction to Sector/Sphere  Major Features  Installation and Configuration  Use Cases

The Sector/Sphere Software The Sector/Sphere Software  Includes two components:  Sector distributed file system  Sector distributed file system  Sphere parallel data processing framework  Open Source, Developed in C++, Apache 2.0 license, available from http://sector.sf.net  Started since 2006, current version is 2.5

Motivation: Data Locality Motivation: Data Locality Traditional systems: separated storage and computing p g p g sub-system Data Expensive, data IO bandwidth bottleneck Storage g Compute p Sector/Sphere model: In-storage processing Inexpensive, parallel data IO, data locality data locality

Motivation: Simplified Programming Motivation: Simplified Programming Parallel/Distributed Programming with g g MPI, etc.: Flexible and powerful. very complicated application development development Sector/Sphere: Clusters regarded as a single entity to the developer, simplified programming p p p g g interface. Limited to certain data parallel applications.

Motivation: Global-scale System Motivation: Global scale System Traditional systems: y Data Center Data Center Require additional effort to locate and Download Data Center d a move data. o l p U Data Reader Upload Asia Location Data Provider US Location Data Center Download Data Reader Asia Location Sector/Sphere: Sector/Sphere Support wide-area data collection and pp Processing g distribution. Data User U p d US Location l o a a o d l Upload p U Data Provider Europe Location Data Provider US Location Data Provider Data Provider US Location

Sector Distributed File System Sector Distributed File System User account Metadata Metadata System access tools System access tools Data protection Scheduling App. Programming System Security Service provider Interfaces S Security Server S M Masters Clients SSL SSL Data UDT Encryption optional Encryption optional slaves slaves Storage and g Processing

Security Server Security Server  User account authentication: password and IP address  Sector uses its own account source, but can be extended to connected LDAP or local system accounts y  Authenticate masters and slaves with certificates and IP addresses

Master Server Master Server  Maintain file system metadata  Multiple active masters: high availability and load balancing  Can join and leave at run time  Can join and leave at run time  All respond to users’ requests  Synchronize system metadata  Maintain status of slave nodes and other master nodes  Response users’ requests

Slave Nodes Slave Nodes  Store Sector files  Sector is user space file system each Sector file is stored on  Sector is user space file system, each Sector file is stored on the local file system (e.g., EXT, XFS, etc.) of one or more slave nodes  Sector file is not split into blocks S fil i li i bl k  Process Sector data  Process Sector data  Data is processed on the same storage node, or nearest storage node possible g p  Input and output are Sector files

Clients Clients  Sector file system client API  Access Sector files in applications using the C++ API pp g  Sector system tools  File system access tools  Fil t t l  FUSE  Mount Sector file system as a local directory  Sphere programming API S h i API  Develop parallel data processing applications to process Sector data with a set of simple API

Topology Aware and Application Aware Topology Aware and Application Aware  Sector considers network topology when managing files and scheduling jobs and scheduling jobs  Users can specify file location when necessary, e.g., in p y y, g , order to improve application performance or comply with a security requirement.

Replication Replication  Sector uses replication to provide software level fault tolerance  No hardware RAID is required  Replication number  All files are replicated to a specific number by default. No under- replication or over replication is allowed replication or over-replication is allowed.  Per file replication value can be specified  Replication distance  Replication distance  By default, replication is created on furthest node  Per file distance can be specified, e.g., replication is created at local rack only. y  Restricted location  Files/directories can be limited to certain location (e.g., rack) only. ( g , ) y

Fault Tolerance (Data) Fault Tolerance (Data)  Sector guarantee data consistency between replicas  Data is replicated to remote racks and data centers  Can survive loss of data center connectivity  Can survive loss of data center connectivity  Existing nodes can continue to serve data no matter how g many nodes are down  Sector does not require permanent metadata; file system can be rebuilt from real data only

Fault Tolerance (System) Fault Tolerance (System)  All Sector master and slave nodes can join and leave at run time run time  Master monitors slave nodes and can automatically y restart a node if it is down; or remove a node if it appears to be problematic  Clients automatically switch to good master/slave node if th the current connected one is down t t d i d  Transparent to users

UDT: UDP-based Data Transfer UDT: UDP based Data Transfer  http://udt.sf.net  Open source UDP based data transfer protocol  With reliability control and congestion control  Fast, firewall friendly, easy to use  Already used in many commercial and research systems for large data transfer g  Support firewall traversing via UDP hole punching

Wide Area Deployment Wide Area Deployment  Sector can be deployed across multiple data centers  Sector uses UDT for data transfer  Data is replicated to different data centers (configurable)  A client can choose a nearby replica y p  All data can survive even in the situation of losing connection to a data center

Rule-based Data Management Rule based Data Management  Replication factor, replication distance, and restricted locations can be configured at per-file level and can be locations can be configured at per file level and can be dynamically changed at run time  Data IO can be balanced between throughput and fault tolerance at per client/per file level

In-Storage Data Processing In Storage Data Processing  Every storage node is also a compute node  Data is processed at local node or the nearest available node  Certain file operations such as md5sum and grep can run significantly faster in Sector significantly faster in Sector  In-storage processing + parallel processing  No data IO is required  Large data analytics with Sphere and MapReduce API

Summary of Sector s Unique Features Summary of Sector’s Unique Features  Scale up to 1,000s of nodes and petabytes of storage  Software level fault tolerance (no hardware RAID is required)  Software level fault tolerance (no hardware RAID is required)  Works both within a single data center or across distributed data centers with topology awareness  In-storage massive parallel data processing via Sphere and MapReduce APIs  Flexible rule-based data management  Fl ibl l b d d t t  Integrated WAN acceleration  Integrated security and firewall traversing features  Integrated security and firewall traversing features  Integrated system monitoring

Limitations Limitations  File size is limited by available space of individual storage nodes. nodes.  Users may need to split their datasets into proper sizes. y p p p  Sector is designed to provide high throughput on large g p g g p g datasets, rather than extreme low latency on small files.

Sphere: Simplified Data Processing Sphere: Simplified Data Processing  Data parallel applications  Data is processed at where it resides, or on the nearest possible node (locality)  Same user defined functions (UDF) are applied on all elements (records, blocks, files, or directories)  Processing output can be written to Sector files or sent back to the client  Transparent load balancing and fault tolerance

Sphere: Simplified Data Processing Sphere: Simplified Data Processing Application pp for each file F in (SDSS datasets) for each file F in (SDSS datasets) for each image I in F Sphere Client findBrownDwarf(I, …); Collect result Split data Split data n+m ... n+3 n+2 n+1 n Input Stream Locate and Schedule Locate and Schedule SPEs SphereStream sdss; SPE SPE SPE SPE sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss,"findBrownDwarf", …); ( d fi d f ) n+3 n+2 n+1 n ... n-k Output Stream findBro nD arf(char* image findBrownDwarf(char* image, int isize, char* result, int rsize); int isi e char* res lt int rsi e)

Sector/Sphere Tutorial Yunhong Gu CloudCom 2010, Nov. 30, - PowerPoint PPT Presentation

Sector/Sphere Tutorial Yunhong Gu CloudCom 2010, Nov. 30, Indianapolis, IN Outline Outline Introduction to Sector/Sphere Major Features Installation and Configuration Use Cases The Sector/Sphere Software The Sector/Sphere

CS 225 Data Structures Sept. 15 - Templates RedBall r; Sphere obj; RedBall obj; Sphere

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

DynamO Workshop Tutorial: Hard sphere simulation Dr Marcus N. Bannerman & Dr Leo Lue

Accelerating Sphere Tracing Csaba Blint , Gbor Valasek Etvs Lornd University, Hungary

Recent breakthroughs in sphere packing Abhinav Kumar Stony Brook, ICTS November 8, 2019 Abhinav

CS 225 Data Structures joinSpheres-returnByValue.cpp 11 /* 12 * Creates a new sphere that

A GAMS TUTORIAL A GAMS TUTORIAL A GAMS TUTORIAL WHAT IS GAMS ? General Algebraic Modeling

Installation and Usage Yunhong Gu July 2010 Agenda System Overview Installation File

Volume Presentation Volume of a Volume with sphere, cone, Slant height cylinder, and pyramid.

Low Silver BGA Sphere Metallurgy Project COMPARISON OF FOUR LOW-SILVER SPHERE ALLOYS AND

Sphere packing, lattice packing, and related problems Abhinav Kumar Stony Brook April 25, 2018

The Euclidean Algorithm in Circle/Sphere Packings Arseniy (Senia) Sheydvasser October 25, 2019

PROCEDURAL OBJECT MODELING 1 OUTLINE Building a Sphere Building a Torus 2 A SPHERE!!

Slide 1 / 29 Slide 2 / 29 1. Sphere 1 carries a positive charge 1. Sphere 1 carries a

CS 225 Data Structures Wad ade Fag agen-Ulm lmschneid ider #include "sphere.h"

Slide 7 / 39 Slide 8 / 39 6 What is the electric potential at point c if each side of the

Invasive Malleable Applications Sebastian Buchwald, Manuel Mohr, Andreas Zwinkau Karlsruhe

DRAFT Scaling MySQL with Python draft2 Roberto Polli - roberto.polli@par-tec.it Par-Tec Spa -

Classification of communication and cooperation mechanisms for logical and symbolic computation

CISC 322 Software Architecture Lecture 16: Design Patterns 3 Emad Shihab Material drawn from

Effect of Non-Passive Operator on Enhanced Wave-Based Teleoperator for Robotic-Assisted Surgery:

@ Mastering the Millennial Mindset and Beyond How to Attract and Retain Emerging Leaders Lisa

Mastering the game of Go with deep neural networks and tree search Article overview by

[Transition from Matts presentation] Before the University Libraries at UNCG began making the

Sector/Sphere Tutorial Yunhong Gu CloudCom 2010, Nov. 30, - PowerPoint PPT Presentation

Sector/Sphere Tutorial Yunhong Gu CloudCom 2010, Nov. 30, Indianapolis, IN Outline Outline Introduction to Sector/Sphere Major Features Installation and Configuration Use Cases The Sector/Sphere Software The Sector/Sphere

CS 225 Data Structures Sept. 15 - Templates RedBall r; Sphere obj; RedBall obj; Sphere

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

DynamO Workshop Tutorial: Hard sphere simulation Dr Marcus N. Bannerman &amp; Dr Leo Lue

Accelerating Sphere Tracing Csaba Blint , Gbor Valasek Etvs Lornd University, Hungary

Recent breakthroughs in sphere packing Abhinav Kumar Stony Brook, ICTS November 8, 2019 Abhinav

CS 225 Data Structures joinSpheres-returnByValue.cpp 11 /* 12 * Creates a new sphere that

A GAMS TUTORIAL A GAMS TUTORIAL A GAMS TUTORIAL WHAT IS GAMS ? General Algebraic Modeling

Installation and Usage Yunhong Gu July 2010 Agenda System Overview Installation File

Volume Presentation Volume of a Volume with sphere, cone, Slant height cylinder, and pyramid.

Low Silver BGA Sphere Metallurgy Project COMPARISON OF FOUR LOW-SILVER SPHERE ALLOYS AND

Sphere packing, lattice packing, and related problems Abhinav Kumar Stony Brook April 25, 2018

The Euclidean Algorithm in Circle/Sphere Packings Arseniy (Senia) Sheydvasser October 25, 2019

PROCEDURAL OBJECT MODELING 1 OUTLINE Building a Sphere Building a Torus 2 A SPHERE!!

Slide 1 / 29 Slide 2 / 29 1. Sphere 1 carries a positive charge 1. Sphere 1 carries a

CS 225 Data Structures Wad ade Fag agen-Ulm lmschneid ider #include &quot;sphere.h&quot;

Slide 7 / 39 Slide 8 / 39 6 What is the electric potential at point c if each side of the

Invasive Malleable Applications Sebastian Buchwald, Manuel Mohr, Andreas Zwinkau Karlsruhe

DRAFT Scaling MySQL with Python draft2 Roberto Polli - roberto.polli@par-tec.it Par-Tec Spa -

Classification of communication and cooperation mechanisms for logical and symbolic computation

CISC 322 Software Architecture Lecture 16: Design Patterns 3 Emad Shihab Material drawn from

Effect of Non-Passive Operator on Enhanced Wave-Based Teleoperator for Robotic-Assisted Surgery:

@ Mastering the Millennial Mindset and Beyond How to Attract and Retain Emerging Leaders Lisa

Mastering the game of Go with deep neural networks and tree search Article overview by

[Transition from Matts presentation] Before the University Libraries at UNCG began making the

DynamO Workshop Tutorial: Hard sphere simulation Dr Marcus N. Bannerman & Dr Leo Lue

CS 225 Data Structures Wad ade Fag agen-Ulm lmschneid ider #include "sphere.h"