hadoop scalable infrastructure for big data
play

Hadoop: Scalable Infrastructure for Big Data QCon London 2012 - PowerPoint PPT Presentation

Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser parand@xpenser.com QCon London 2012 What is Hadoop? QCon London 2012 Hadoop is the Linux of Big Data Processing QCon London 2012


  1. Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser parand@xpenser.com QCon London 2012

  2. What is Hadoop? QCon London 2012

  3. Hadoop is the Linux of Big Data Processing QCon London 2012

  4. Infrastructure for Large Scale Computation & Data Processing on a network of Commodity Hardware. QCon London 2012

  5. Why Hadoop? QCon London 2012

  6. Scale QCon London 2012

  7. Cost QCon London 2012

  8. Freedom QCon London 2012

  9. Does Anyone Use Hadoop? QCon London 2012

  10. eHarmony IBM Zion's bank VISA NY Times Microsoft Twitter Facebook eBay Yahoo LinkedIn AOL ... ... QCon London 2012

  11. Alternatives Build your own Get creative with RDBMS architecture QCon London 2012

  12. What's the idea? QCon London 2012

  13. Commodity Hardware Distributed Operation QCon London 2012

  14. Wisdom: Embrace Failure (hardware) Be Resilient (software) QCon London 2012

  15. What's in the box? QCon London 2012

  16. Hadoop Distributed File System QCon London 2012

  17. Distributed Computation Framework QCon London 2012

  18. Map-Reduce Programming Model QCon London 2012

  19. HDFS ● Your data in triplicate ● Built-in resiliency to large scale failures ● Intelligent Data Distribution ● Very large data sizes QCon London 2012

  20. Distributed Computation ● Built-in resiliency to large scale failures ● Distribute work to workers, collect results from fastest ● Move computation to data (not data to computation) QCon London 2012

  21. Map Reduce Very simple programming model: Map(anything)->key, value Sort, partition on key Reduce(key,value)->key, value No parallel processing or message passing semantics Programmable in Java or any other language (streaming) QCon London 2012

  22. Ecosystem HBase: NoSQL BigTable clone Hive: Somewhat-SQL data store Pig: SQL-like programming model Chukwa, Scribe, Mahoot, Cassandra, Oozie, Sqoop, ... QCon London 2012

  23. Commercial Support Cloudera HortonWorks IBM ... QCon London 2012

  24. How? Try it in non-distributed mode Try it on a few spare machines Try it on EC2 Try it! http://hadoop.apache.org/ QCon London 2012

  25. Case Studies QCon London 2012

  26. eHarmony QCon London 2012

  27. Biz360 (Attensity) QCon London 2012

  28. Yahoo! QCon London 2012

  29. You! QCon London 2012

  30. Start with ETL QCon London 2012

  31. Start with batch, non time-critical tasks QCon London 2012

  32. Start with storing your large data on HDFS QCon London 2012

  33. Move batch processing to Hadoop Serve from RDBMS QCon London 2012

  34. Embrace. Be One With The Hadoop. QCon London 2012

  35. Questions? Parand Tony Darugar parand@xpenser.com QCon London 2012

Recommend


More recommend