Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser parand@xpenser.com QCon London 2012
What is Hadoop? QCon London 2012
Hadoop is the Linux of Big Data Processing QCon London 2012
Infrastructure for Large Scale Computation & Data Processing on a network of Commodity Hardware. QCon London 2012
Why Hadoop? QCon London 2012
Scale QCon London 2012
Cost QCon London 2012
Freedom QCon London 2012
Does Anyone Use Hadoop? QCon London 2012
eHarmony IBM Zion's bank VISA NY Times Microsoft Twitter Facebook eBay Yahoo LinkedIn AOL ... ... QCon London 2012
Alternatives Build your own Get creative with RDBMS architecture QCon London 2012
What's the idea? QCon London 2012
Commodity Hardware Distributed Operation QCon London 2012
Wisdom: Embrace Failure (hardware) Be Resilient (software) QCon London 2012
What's in the box? QCon London 2012
Hadoop Distributed File System QCon London 2012
Distributed Computation Framework QCon London 2012
Map-Reduce Programming Model QCon London 2012
HDFS ● Your data in triplicate ● Built-in resiliency to large scale failures ● Intelligent Data Distribution ● Very large data sizes QCon London 2012
Distributed Computation ● Built-in resiliency to large scale failures ● Distribute work to workers, collect results from fastest ● Move computation to data (not data to computation) QCon London 2012
Map Reduce Very simple programming model: Map(anything)->key, value Sort, partition on key Reduce(key,value)->key, value No parallel processing or message passing semantics Programmable in Java or any other language (streaming) QCon London 2012
Ecosystem HBase: NoSQL BigTable clone Hive: Somewhat-SQL data store Pig: SQL-like programming model Chukwa, Scribe, Mahoot, Cassandra, Oozie, Sqoop, ... QCon London 2012
Commercial Support Cloudera HortonWorks IBM ... QCon London 2012
How? Try it in non-distributed mode Try it on a few spare machines Try it on EC2 Try it! http://hadoop.apache.org/ QCon London 2012
Case Studies QCon London 2012
eHarmony QCon London 2012
Biz360 (Attensity) QCon London 2012
Yahoo! QCon London 2012
You! QCon London 2012
Start with ETL QCon London 2012
Start with batch, non time-critical tasks QCon London 2012
Start with storing your large data on HDFS QCon London 2012
Move batch processing to Hadoop Serve from RDBMS QCon London 2012
Embrace. Be One With The Hadoop. QCon London 2012
Questions? Parand Tony Darugar parand@xpenser.com QCon London 2012
Recommend
More recommend