the apache hadoop ecosystem
play

The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache - PowerPoint PPT Presentation

The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache Context: exponential for decades! abundance of computing & storage generated data (8ZB in '15) peta-scale is now affordable (kMGTPEZY) petabytes


  1. The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache

  2. Context: exponential for decades! ● abundance of ○ computing & storage ○ generated data (8ZB in '15) ● peta-scale is now affordable (kMGTPEZY) ○ petabytes ○ petahertz ● traditional data tech doesn't scale well ● more data provides greater value ● time for a new approach

  3. New Hardware Approach Traditional Big Data ● exotic hardware ● commodity HW ○ big central servers ○ racks of pizza boxes ○ SAN ○ Ethernet ○ RAID ○ JBOD ● hardware reliability ● unreliable HW ● expensive ● cost effective ● limited scalability ● scales further

  4. New Software Approach Traditional Big Data ● monolithic ● distributed ○ centralized storage ○ storage & compute ○ RDBMS nodes ● schema first ● raw data ● proprietary ● open source

  5. The Ecosystem is the System ● Hadoop has become the kernel ○ of the distributed operating system for Big Data ○ a de facto industry standard ● No one uses the kernel alone ● A collection of projects at Apache

  6. Open Source at Apache ● no strategic agenda ○ quality is emergent ● community based ○ diverse organizations collaborating voluntarily ○ decisions by consensus ○ transparent ● allows competing projects ○ survival of fittest ● a loose federation of projects ○ permits evolution ● insures against vendor lock-in ○ can't buy Apache

  7. Typical adoption pattern ● Idea that's impractical without Hadoop. ● Build Hadoop-based proof of concept. ● Move initial application to production. ● Add more datasets and users. ○ removing silos in organizations ○ permitting easy experiments on real data Snowballs into institution's central repository for ● analysis ● data processing

  8. How can you use Hadoop? ● What data are you ignoring? ○ How can you use it? ● How can you combine your data with others?

  9. Thanks! Questions? Visit Cloudera at booth 700.

Recommend


More recommend