The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache - PowerPoint PPT Presentation

Jul 01, 2023 •232 likes •329 views

The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache Context: exponential for decades! abundance of computing & storage generated data (8ZB in '15) peta-scale is now affordable (kMGTPEZY) petabytes

The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache
Context: exponential for decades! ● abundance of ○ computing & storage ○ generated data (8ZB in '15) ● peta-scale is now affordable (kMGTPEZY) ○ petabytes ○ petahertz ● traditional data tech doesn't scale well ● more data provides greater value ● time for a new approach
New Hardware Approach Traditional Big Data ● exotic hardware ● commodity HW ○ big central servers ○ racks of pizza boxes ○ SAN ○ Ethernet ○ RAID ○ JBOD ● hardware reliability ● unreliable HW ● expensive ● cost effective ● limited scalability ● scales further
New Software Approach Traditional Big Data ● monolithic ● distributed ○ centralized storage ○ storage & compute ○ RDBMS nodes ● schema first ● raw data ● proprietary ● open source
The Ecosystem is the System ● Hadoop has become the kernel ○ of the distributed operating system for Big Data ○ a de facto industry standard ● No one uses the kernel alone ● A collection of projects at Apache
Open Source at Apache ● no strategic agenda ○ quality is emergent ● community based ○ diverse organizations collaborating voluntarily ○ decisions by consensus ○ transparent ● allows competing projects ○ survival of fittest ● a loose federation of projects ○ permits evolution ● insures against vendor lock-in ○ can't buy Apache
Typical adoption pattern ● Idea that's impractical without Hadoop. ● Build Hadoop-based proof of concept. ● Move initial application to production. ● Add more datasets and users. ○ removing silos in organizations ○ permitting easy experiments on real data Snowballs into institution's central repository for ● analysis ● data processing
How can you use Hadoop? ● What data are you ignoring? ○ How can you use it? ● How can you combine your data with others?
Thanks! Questions? Visit Cloudera at booth 700.

Recommend

RDF and the Hadoop Ecosystem Rob Vesse Twitter: @RobVesse Email: rvesse@apache.org 1

RDF and the Hadoop Ecosystem Rob Vesse Twitter: @RobVesse Email: rvesse@apache.org 1 So#ware Engineer at YarcData (part of Cray Inc) Working on big data analy/cs products

457 views • 30 slides

Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import

10/6/2011 import java.io.IOException; import org.apache.hadoop.fs.Path; Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import

177 views • 7 slides

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan @Cloudera, Sr. Manager, Compute Platform Apache Hadoop PMC @Cloudera, Apache Hadoop PMC Agenda Hadoop Community Updates & Overview Updates

871 views • 47 slides

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source software framework that Stores big data in a distributed manner Processes big data parallelly Builds on large clusters of commodity hardware.

2.93k views • 60 slides

Distributed Computation of with Apache Hadoop Tsz-Wo Sze Yahoo! Cloud Computing Apache

Distributed Computation of with Apache Hadoop Tsz-Wo Sze Yahoo! Cloud Computing Apache Hadoop PMC Member Mapred2010 Dec 1 1 Agenda Introduction A New World Record How to Compute The n th Bits of ? Computing with

759 views • 52 slides

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD, Apache Committer, Crunch PMC

395 views • 36 slides

CS5412 / Lecture 21 Ken Birman & Kishore Apache Tools Part 2 Pusukuri, Spring 2019

CS5412 / Lecture 21 Ken Birman & Kishore Apache Tools Part 2 Pusukuri, Spring 2019 HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 1 PUTTING IT ALL TOGETHER Reminder: Apache Hadoop Ecosystem HDFS (Distributed File System)

546 views • 43 slides

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang |Software Engineer,

1 Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang |Software Engineer, Cloudera April 07, 2014 2 Outline Introduction Hadoop security primer Authentication Authorization Data Protection Governance

492 views • 30 slides

Apache Hadoop YARN: The Next- generation Distributed Operating System Zhijie Shen & Jian He

Apache Hadoop YARN: The Next- generation Distributed Operating System Zhijie Shen & Jian He @ Hortonworks About Us Software Engineer @ Hortonworks, Inc. Hadoop Committer @ The Apache Foundation Were doing YARN! Agenda

562 views • 29 slides

Streamline Hadoop DevOps with Apache Ambari Alejandro Fernandez May

Streamline Hadoop DevOps with Apache Ambari Alejandro Fernandez May 18, 2017 Speaker Alejandro Fernandez Staff Software Engineer @ Hortonworks Apache Ambari PMC alejandro@apache.org WHY ARE WE

1.33k views • 52 slides

Apache Flume Getting data into Hadoop Problem Getting data into HDFS is not difficult: %

Apache Flume Getting data into Hadoop Problem Getting data into HDFS is not difficult: % hadoop fs --put data.csv . works great when data is neatly packaged and ready to upload Unfortunately, e.g. a webserver is creating data

495 views • 24 slides

Apache Storm Christopher Little Apache Storm Alternatives Storm Hadoop Spark Streaming

Real-time Web Marketing with Apache Storm Christopher Little Apache Storm Alternatives Storm Hadoop Spark Streaming Processing Model DAG MapReduce DAG Processing Unit Record-at-a-time Batch Mini Batch Latency Sub-second High Few

386 views • 17 slides

Apache Giraph Large-scale Graph Processing on Hadoop Claudio Martella <claudio@apache.org>

Apache Giraph Large-scale Graph Processing on Hadoop Claudio Martella <claudio@apache.org> @claudiomartella 2 Graphs are simple 3 A computer network 4 A social network 5 A semantic network 6 A map 7 Predicting break ups Graph

768 views • 44 slides

Apache Spark CS240A T Yang Some of them are based on P. Wendells Spark slides Parallel

Apache Spark CS240A T Yang Some of them are based on P. Wendells Spark slides Parallel Processing using Spark+Hadoop Hadoop: Distributed file system that connects machines. Mapreduce: parallel programming style built on a Hadoop

840 views • 36 slides

Apache Whirr (Incubating) Open Source Cloud Services Tom White, Cloudera, @tom_e_white OSCON

Apache Whirr (Incubating) Open Source Cloud Services Tom White, Cloudera, @tom_e_white OSCON Data, Portland, OR 25 July 2011 About me Apache Hadoop Committer, PMC Member, Apache Member Engineer at Cloudera working on core Hadoop

794 views • 38 slides

Apache HIVE Data Warehousing & Analytics on Hadoop Hefu Chai What is HIVE? A system for

Apache HIVE Data Warehousing & Analytics on Hadoop Hefu Chai What is HIVE? A system for managing and querying structured data built on top of Hadoop Uses Map-Reduce for execution HDFS for storage Extensible to other Data

313 views • 15 slides

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part of Apache project. Hadoop Architecture Ambari Ambari offers a Web-based GUI with wizard scripts for setting up clusters with most of the standard

436 views • 18 slides

Da Data c cubes i in A n Apache he H Hive Amareshwari Sriramadasu Jaideep Dhok Engineer

Da Data c cubes i in A n Apache he H Hive Amareshwari Sriramadasu Jaideep Dhok Engineer at Inmobi Amareshwari Apache Hive Committer Apache Hadoop PMC Sriramadasu Working in Hadoop and eco systems since 2007

678 views • 33 slides

Hadoop Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt Publishing,

Hadoop Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt Publishing, 2016. July 9, 2019 (Dr. Mihail ) Intro Big Data July 9, 2019 1 / 22 Apache Hadoop What is it? Apache Hadoop is a software framework that enables

869 views • 22 slides

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Gideon Zenz Frankfurter Entwicklertag 2014 19.02.2014 Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz Frankfurter Entwicklertag 2014 Agenda Hadoop Intro Map/Reduce

234 views • 19 slides

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian Tzolov Whoami Christian Tzolov Engineer at Pivotal, Big-Data, Hadoop, Spring Cloud Dataflow, Apache Geode, Apache HAWQ, Apache Committer, Apache

796 views • 41 slides

Streamline Hadoop DevOps with Apache Ambari Alejandro Fernandez Speaker

Streamline Hadoop DevOps with Apache Ambari Alejandro Fernandez Speaker Alejandro Fernandez Sr. Software Engineer @ Hortonworks Apache Ambari PMC alejandro@apache.org What is Apache Ambari?

784 views • 42 slides

Large-Scale Data Engineering Frameworks Beyond MapReduce event.cwi.nl/lsde THE HADOOP ECOSYSTEM

Large-Scale Data Engineering Frameworks Beyond MapReduce event.cwi.nl/lsde THE HADOOP ECOSYSTEM www.cwi.nl/~boncz/bads event.cwi.nl/lsde YARN: Hadoop version 2.0 Hadoop limitations: Can only run MapReduce What if we want to run

867 views • 58 slides

Apache Hadoop Ingestion & Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi

Strata NY 2018 September 12, 2018 Apache Hadoop Ingestion & Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi omkar@uber.com Eric Sayle esayle@uber.com Uber Hadoop Platform Team Agenda Mission

665 views • 41 slides