the nexus of open source innovation
play

The Nexus of Open Source Innovation Eric Baldeschwieler, CTO, - PowerPoint PPT Presentation

Apache Hadoop Framework The Nexus of Open Source Innovation Eric Baldeschwieler, CTO, Hortonworks Avik Dey, Director, Hadoop Services, Intel Moderator: Todd Cramer, Director Product Marketing, Intel Evolution to Open Source Data Management with


  1. Apache Hadoop Framework The Nexus of Open Source Innovation Eric Baldeschwieler, CTO, Hortonworks Avik Dey, Director, Hadoop Services, Intel Moderator: Todd Cramer, Director Product Marketing, Intel

  2. Evolution to Open Source Data Management with Scale-out Storage & Processing Processing Style/ Form Factor Date Paradigm Scale Out RDBMS 90s • Reporting / Data Mining • Batch – “sales reports” • High Cost / Isolated use • Sequential SQL queries Scale Multi-core No SQL RDBMS • Model-based discovery 2000s • Batch-ie correlated buying • High Cost / Dept Use pattern • No SQL. parallel analysis • Shared disk/memory Node Node Proprietary MPP/ Node Scale DW Appliance Open Source SW coupled • Real-time- ie recommend engine Today • Unbounded Map Reduce to commodity HW • Process @ storage node Query Node Node Node • Built-in data replication/reliability • Low Cost / Enterprise Use • Shared nothing, in memory • Arrival of vast amounts of Unlimited unstructured data Linear Scale Distributed node addition

  3. Apache Hadoop Evolution Source - Steven Nimmons 2/24-12 2006 2008 2009-10 2011-12 HDFS HBase Flume HCatalog • • • • MapReduce ZooKeeper Avro Bigtop • • • • Pig Whirr Ambari • • • Hive Sqoop Yarn • • • Mahout • Oozie •

  4. Hadoop: What will it take to cross The Chasm? Orgs looking for use cases & ref arch • relative % customers Ecosystem evolving to create a pull market • Enterprises endure 1-3 year adoption cycle • The CHASM Innovators Early Early Late majority, Laggards, technology adopters, majority, conservatives Skeptics enthusiasts visionaries pragmatists time Customers want Customers want technology & performance solutions & convenience & reliability Source: Geoffrey Moore - Crossing the Chasm

  5. Enterprise Big Data Flows Business Unstructured CRM, ERP Data Transactions Web, Mobile & Interactions Point of sale Big Data Log files Platform Exhaust Data Classic Data Integration & ETL Social Media Sensors, Business Dashboards, devices Reports, Intelligence Visualization, & Analytics … DB data Capture Big Data Process Exchange Collect data from all Transform, refine, Interoperate and share 1 2 3 sources structured & aggregate, analyze, data with unstructured report applications/analytics

  6. What changes from POC to large clusters? 5-100 nodes 4000 node “Small cluster” “Hadoop at Scale” Cluster Size Node Node Node • Staff & consultants are dominant • Hardware + Power + Hosting are costs dominant costs • Redundant networks, hardware • Hardware Optimization reliability features save human • Failures are inevitable, Hadoop software capital & support handles this • Need to focus on simplicity • Hadoop operations expertise

  7. Optimizing Hadoop Deployments Address Potential NETWORK Deployment Bottlenecks STORAGE COMPUTE Fast Fabric Benchmark Disk Compute Security & Tuning Write/memory APIs SSDs Hi-tune Encryption Instruction 10GbE Non-volatile Hi-Bench Sets memory

  8. Talk to an Expert: Question & Answer Today’s Experts: • Eric Baldeschwieler, CTO, Hortonworks - @JERIC14 • Avik Dey, Director, Hadoop Services, Intel - @AvikonHadoop Submit your questions: • Ask questions at anytime by pressing the Question tab at the top of the player. Download today’s content: • Located under the attachment tab at the top of the player More information: • www.intel.com/bigdata

Recommend


More recommend