Analytics for Object Storage Simplified - Unified File and Object - PowerPoint PPT Presentation

Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil STSM, Master Inventor, IBM Spectrum Scale Smita Raut Object Development Lead, IBM Spectrum Scale Acknowledgement : Bill Owen, Tomer Perry, Dean Hildebrand, Piyush Chaudhary, Yong Zeng, Wei Gong, Theodore Hoover Jr, Muthuannamalai Muthiah.

Agenda • Part 1 : Need as well as Design Points for Unified File and Object  Introduction to Object Storage  Unified File & Object Access  Use Cases Enabled By UFO • Part 2: Analytics with Unified File and Object  Big Data Analytics and Challanges  Design Points, Approach and Solution  Unified File & Object Store and Congnitive Computing

Part 1 : Need as well as Design Points for Unified File and Object  Object Storage Introduction 3

Introduction to Object Store • Object storage is highly available, distributed, eventually consistent storage. • Data is stored as individual objects with unique identifier Flat addressing scheme that allows for greater scalability ● • Has simpler data management and access • REST-based data access • Simple atomic operations: • PUT, POST, GET, DELETE Usually software based that runs on commodity hardware ● Capable of scaling to 100s of petabytes ● Uses replication and/or erasure coding for availability instead of RAID ● Access over RESTful API over HTTP, which is a great fit for cloud and mobile applications ● – Amazon S3, Swift, CDMI API 4

Object Storage Enables The Next Generation of Data Management Simple APIs/Semanti Scalable cs (Swift/S3, Ubiquitous Multi-Tenancy Metadata Versioning, Access Access Whole File Updates) Simpler Scalable and Multi-Site management Cost Savings Highly- Cloud and flatter Available Storage namespace

But Does it Create Yet Another Storage Island in Your Data Center…?? 6

 Unified File and Object Access 7

What is Unified File and Object Access ? • Accessing object using file interfaces NFS/SMB/POSIX Object(http) Files accessed as (SMB/NFS/POSIX) and accessing file using object Data ingested Objects interfaces (REST) helps legacy applications as Files 3 4 designed for file to seamlessly start integrating into the object world. 2 1 • Objects accessed It allows object data to be accessed using Data ingested as Files applications designed to process files . It allows file as Objects data to be published as objects. • Multi protocol access for file and object in the same <Container> namespace (with common User ID management capability) allows supporting and hosting data oceans Swift (With Swift on File) of different types of data with multiple access options. <Clustered file system> • Optimizes various use cases and solution architectures resulting in better efficiency as well as cost savings. File Exports created on container level OR POSIX access from container level 8

Flexible Identity Management Modes • Two Identity Management Modes • Administrators can choose based on their need and use-case Identity Management Modes Suitable for unified file and object access for Suitable when auth schemes for file and end users. Leverage common ILM policies object are different and unified access for file and object data based on data is for applications ownership Local_Mode Unified_Mode Object created by Object interface Object created from Object interface should be will be owned by internal “swift” user owned by the user doing the Object PUT (i.e FILE will be owned by UID/GID of the user) Application processing the object data Owner of the object will own and from file interface will need the required have access to the data from file file ACL to access the data. interface. Users from Object and File are expected to be Object authentication setup common auth and coming from same directory is independent of File service (only AD+RFC 2307 or LDAP) Authentication setup 9

 Use Case Enabled by Unified File Object 10

Use case : Process Object Data with File-Oriented Applications and Publish Outcomes as Objects Final processed videos available as Media House OpenStack Cloud Platform Objects in container which is used for external publishing (Tenant = Media House Subsidiaries) Publishing Channels VM Farm for Subsidiary 1 VM Farm for Subsidiary 2 for video processing for video processing Subsidiary 1 Subsidiary 2 Ingest Final Video (as objects) Media Objects available for streaming Virtual Virtual Virtual Virtual …. …. Machine Machine Machine Machine Instances Instances Container Instances Instances Container1 1’ Container2 Raw media content sent for media Unified file processing which happens over files NFS Export NFS Export NFS Export & Object (Object to File access) on on on Container 1’ Container 1 Container 2 Manila Shares (NFS) exported only for Subsidiary2 Manila Shares (NFS) exported only for Subsidiary1 Files converted into objects for publishing (File to Object access) 11

We have now understood Part 1: Need as well as Design for Unified File and Object ….. Let us now deep dive on Part 2: Analytics with Unified File and Object

 Big Data Analytics and Challenges 13

Analytics – Broadly Categorized Into Two Sets Traditional Analytics Congnitive

Big Data  Big data is a term for data sets that are so large or complex that traditional data processing applications (database management tools or traditional data processing applications) are inadequate.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. Characteristics  Volume – The quantity of generated and stored data.  Variety ‒ The type and nature of the data.  Velocity ‒ Speed at which the data is generated and processed  Variability ‒ Inconsistency of data sets can hamper processes manage it  Veracity ‒ Quality of captured data can vary greatly, affecting accuracy

Challenges with the Early Big Data Storage Models It’s not just Key Business processes are ! ! one type of now depend on the analytics analytics . . . . . . Ingest data at Move data to the Perform Repeat! analytics engine analytics various end points More data source than ever Can’t just throw away data due to It takes hours or days ! before, not just data you own, ! ! to move the data! regulations or business requirement but public or rented data

 Design Points, Approach & Solution 17

What are the Solution Design Points that we came across? New Gen Traditional Compute 6 Bring analytics to the data applications applications farm Single Name Space to common name space 1 house all your data (Files 2 and Object) Unified data access with File & Object Encryption for protection of your data 3 Geographically dispersed Powered by 4 Optimize economics based on value of the data 5 management of data including disaster recovery Flash Disk Tape Shared Nothing Off Premise Cluster

How Did We Approach The Solution & address the Design Points? - Took the Data Ocean Approach Meeting Design Point 2 Unified File and Object – as explained previously Users and applications New Gen Client Traditional Compute applications workstations applications farm Meeting Design Point 6 File Analytics Block OpenStack Object POSIX Meeting Design Point 3 4000+ Cinder Manilla Transparent customers HDFS NFS SMB iSCSI Glance Swift S3 Encryption Spark DR Site Global Namespace Powered by Clustered File System Meeting Design Point 1 Site A Automated data placement and data migration Site B Transparent Cloud Tier Spectrum Scale Site C RAID Flash Disk Tape Shared Nothing Worldwide Data JBOD/JBOF Cluster Distribution Meeting Design Point 4 Meeting Design Point 5 | 19

Meeting Design Point 6 – Bring Analytics to Data Apache Hadoop - Key Platform for Big Data and Analytics  An open-source software framework and most popular BD&A platform  Designed for distributed storage and processing of very large data sets on computer clusters built from commodity hardware  Core of Hadoop consists of o A processing part called MapReduce o A storage part, known as Hadoop Distributed File System (HDFS) o Hadoop common libraries and components  Leading Hadoop Distro: HortonWorks, CloudEra, MapR, IBM IOP/BigInsights

Meeting Design Point 6 – Bring Analytics to Data HDFS Shortcomings  HDFS is a shared nothing architecture , which is very inefficient for high throughput jobs (disks and cores grow in same ratio)  Costly data protection :  uses 3-way replication; limited RAID/erasure coding  Works only with Hadoop i.e weak support for File or Object protocols  Clients have to copy data from enterprise storage to HDFS in order to run Hadoop jobs, this can result in running on stale data.

Meeting Design Point 6 – How to Bring Analytics to Data ? Desired Solution: Need In place Analytics (No Copies Required). Clustered Filesystem should support HDFS Connectors

Analytics for Object Storage Simplified - Unified File and Object - PowerPoint PPT Presentation

Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil STSM, Master Inventor, IBM Spectrum Scale Smita Raut Object Development Lead, IBM Spectrum Scale Acknowledgement : Bill Owen, Tomer Perry, Dean

Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil

NAUTILUS Sage Weil - Red Hat FOSDEM - 2019.02.03 1 CEPH UNIFIED STORAGE PLATFORM OBJECT BLOCK

Towards a Unified Object Storage Foundation for Scalable Storage Systems Authors: Cengiz

Agni: An Efficient Dual-access File System over Object Storage Kunal Lillaney, Vasily Tarasov,

THE ELF OBJECT FILE FORMAT PROGRAM EXECUTION gcc/cc output an executable in the ELF format

Efficient Object Storage Journaling in a Distributed Parallel File System Presented by Sarp Oral

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

XtreemFS a case for object-based storage in Grid data management Jan Stender, Zuse Institute

Storage and File Structure December 12, 2008 Storage and File Structure Magnetic Discs RAID

Creation of a load module Load Module Interleaved from Object Module A source object

Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks

: Taming the Cloud Object Storage Ali Anwar , Yue Cheng , Aayush Gupta , Ali R. Butt

50 crossings: 0 Simplified: 41 0 100 crossings: 0 Simplified: 81 0 Simplified: 200 156

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services

Compilation/linking revisited Memory and C/C++ modules From Reading #6 source object file 1

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 OpenStack Summit -

Taking Linux File and Storage Systems into the Future Ric Wheeler Director Kernel File and

File storage and file sharing for human rights organizations A design research case study

Cloud object storage in Ceph Orit Wasserman owasserm@redhat.com Fosdem 2017 AGENDA What is

Microsoft Teams Anywhere, Anytime, Everywhere What is Unified communication & collaboration

Storage Systems OSPP Chap 12 Main Points File systems Useful abstractions on top of

STORAGE CONTENT PROVIDER Storage File System Linux OS Internal External (Flash Memory) (SD

Data Management Systems Storage Management Memory hierarchy Segments and file storage

Analytics for Object Storage Simplified - Unified File and Object - PowerPoint PPT Presentation

Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil STSM, Master Inventor, IBM Spectrum Scale Smita Raut Object Development Lead, IBM Spectrum Scale Acknowledgement : Bill Owen, Tomer Perry, Dean

Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil

NAUTILUS Sage Weil - Red Hat FOSDEM - 2019.02.03 1 CEPH UNIFIED STORAGE PLATFORM OBJECT BLOCK

Towards a Unified Object Storage Foundation for Scalable Storage Systems Authors: Cengiz

Agni: An Efficient Dual-access File System over Object Storage Kunal Lillaney, Vasily Tarasov,

THE ELF OBJECT FILE FORMAT PROGRAM EXECUTION gcc/cc output an executable in the ELF format

Efficient Object Storage Journaling in a Distributed Parallel File System Presented by Sarp Oral

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

XtreemFS a case for object-based storage in Grid data management Jan Stender, Zuse Institute

Storage and File Structure December 12, 2008 Storage and File Structure Magnetic Discs RAID

Creation of a load module Load Module Interleaved from Object Module A source object

Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks

: Taming the Cloud Object Storage Ali Anwar , Yue Cheng , Aayush Gupta , Ali R. Butt

50 crossings: 0 Simplified: 41 0 100 crossings: 0 Simplified: 81 0 Simplified: 200 156

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services

Compilation/linking revisited Memory and C/C++ modules From Reading #6 source object file 1

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 OpenStack Summit -

Taking Linux File and Storage Systems into the Future Ric Wheeler Director Kernel File and

File storage and file sharing for human rights organizations A design research case study

Cloud object storage in Ceph Orit Wasserman owasserm@redhat.com Fosdem 2017 AGENDA What is

Microsoft Teams Anywhere, Anytime, Everywhere What is Unified communication &amp; collaboration

Storage Systems OSPP Chap 12 Main Points File systems Useful abstractions on top of

STORAGE CONTENT PROVIDER Storage File System Linux OS Internal External (Flash Memory) (SD

Data Management Systems Storage Management Memory hierarchy Segments and file storage

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Microsoft Teams Anywhere, Anytime, Everywhere What is Unified communication & collaboration