cc 2 0 by william brawley http flic kr p 7pdup3
play

CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3 August August - PowerPoint PPT Presentation

CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3 August August 31, 31, 2012 2012 Why Hadoop and HBase? 2 Social Media Monitoring Prospective Search and Coprocessors Challenges & Lessons Learned Resources to get


  1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3

  2. August August 31, 31, 2012 2012 • Why Hadoop and HBase? 2 • Social Media Monitoring • Prospective Search and Coprocessors • Challenges & Lessons Learned • Resources to get started Agenda

  3. August August 31, 31, 2012 2012 • Spin-o ff of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland • Big Data expert, focused on Hadoop, HBase and Solr • Objective: Transforming data into insights About Sentric

  4. CC 2.0 by Editor B| h"p://flic.kr/p/bcU5aD1

  5. August August 31, 31, 2012 2012 5 Information Information Analysis & Insight Gathering Processing Interpretation Presentation Why Hadoop and HBase? Social Media Monitoring Process

  6. August August 31, 31, 2012 2012 6 Cost e ff ective High Freshness scalable SMM Reliable RT Alerting Analytical capabilities Why Hadoop and HBase? Requirements

  7. August August 31, 31, 2012 2012 • HDFS + MapReduce 7 • Based on Google Papers • Distributed Storage and Computation Framework • A ff ordable Hardware, Free Software • Significant Adoption Why Hadoop and HBase? Hadoop

  8. August August 31, 31, 2012 2012 • Non-Relational, Distributed Database 8 • Column-Oriented • Multi-Dimensional • High Availability • High Performance • Build on top of HDFS as storage layer Why Hadoop and HBase? HBase

  9. August August 31, 31, 2012 2012 9 Storage HBase /HDFS Search Solr Analytics Hadoop Mahout Event mechanism (MQ) HBase RowLog Real-time alerting Prospective search Why Hadoop and HBase? Technology Stack

  10. CC 2.0 by nolifebeforeco ff ee | http://flic.kr/p/c1UTf

  11. August August 31, 31, 2012 2012 11 Downloaded Articles match? Search Agents Output Web-UI Reports RT Alerts Icons by http://dryicons.com Social Media Monitoring Overview

  12. August August 31, 31, 2012 2012 12 n News Agents REST HBase Coprocessor Web-UI MySQL Solr RT Alerts Icons by http://dryicons.com Social Media Monitoring Solution Architecture

  13. August August 31, 31, 2012 2012 13 Processing Put operations Prospective Search HRegion RT Alerts HRegionServer Icons by http://dryicons.com Social Media Monitoring Prospective Search with Coprocessors

  14. August August 31, 31, 2012 2012 • Monthly growth 14 • Index: 200GB • 50 Mio. docs/month • HBase: 600 GB • Raw data, meta data and extracted data • A few 1000 map-reduce jobs/ month Social Media Monitoring Key Figures

  15. CC 2.0 by saebaryo | h"p://flic.kr/p/5T4t5L

  16. Augus Augus t 31, t 31, 2012 2012 1 Benchmarks - workloads 16 2 Supervision 3 Keys and shards – Schema design /LG 4 Timestamps, the 4th dimension 5 Short ColumnFamily names-> 6 File handles. OS 7 JVM Tuning, GC !!! 8 Scaling region servers, data locality! 9 Automatic vs manual splits, compaction 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr aktionen, it takes some time 12 Use Hbase for a apropriate use case 13 Tune and tweak – it‘s not a project – it‘s a process 14 You need devops in production 15 Huge know-how curve, you need to know the hole ecosystem 16 Use a distribution, ist packed, tested and supports migration, enterprise grade 17 Virtualisierung, Hardware 18 Dont struggle to much, there is a good community 19 Share your knowledge 20 It‘s early state, many tools around, a few still missing Challenges & Lessons Learned

  17. August August 31, 31, 2012 2012 • Everyone is still learning 17 • Some issues only appear at scale • At scale, nothing works as advertised • Production cluster configuration • Hardware issues • Tuning cluster configuration to our work loads • HBase stability • Monitoring health of HBase Challenges & Lessons Learned Challenges

  18. August August 31, 31, 2012 2012 • Do not rely on HBase as frontend 18 storage layer. It’s not going to be rock solid • Don’t struggle to much, there is a good community • Share your knowledge • It‘s early stage, many tools around, a few still missing Challenges & Lessons Learned Lessons - General

  19. August August 31, 31, 2012 2012 • Use HBase for an appropriate use case 19 • Use a distribution, its packed, tested and supports migration, enterprise grade • Benchmarks – know your workloads & query patterns • YCSB • Schema & Key Design • What’s queried together should be stored together • Scaling region servers, data locality! • Virtualization vs. Real Hardware Challenges & Lessons Learned Lessons - Planning

  20. August August 31, 31, 2012 2012 • Number of CF < 10 20 • Compaction + Flushing I/O intensive • Short ColumnFamily names • HFile index size occupying aloc RAM (storefileindexSize) • OS file handles • ulimit –n 32768 • JVM Tuning, GC !!! • HMaster 1024 MB • RegionServer 8192 MB • -XX:+UseConcMarkSweepGC • -XX:+CMSIncrementalMode • Automatic vs. manual splits • Be careful with expensive operations in coprocessors • Play with all the configurations and benchmark for tuning Challenges & Lessons Learned Lessons - Performance Tuning

  21. August August 31, 31, 2012 2012 • Monitoring/Operational tooling is most 21 important • Forget “emergency actions”, it takes some time • Tune and tweak – it‘s not a project – it‘s a process • You need DevOps in production • Huge know-how curve, you need to know the whole ecosystem • Hadoop, HDFS, MapRed Challenges & Lessons Learned Lessons - Operation

  22. August August 31, 31, 2012 2012 • http://hbase.apache.org/book.html 22 • http://www.sentric.ch/blog/best- practice-why-monitoring-hbase-is- important • http://www.sentric.ch/blog/hadoop- overview-of-top-3-distributions • http://www.sentric.ch/blog/hadoop- best-practice-cluster-checklist • http://outerthought.org/blog/465- ot.html Resources to get started

  23. August August 31, 31, 2012 2012 23 Questions? Questions? Christian Gügi, christian.guegi@sentric.ch Jean-Pierre König, jean-pierre.koenig@sentric.ch NoSQL Roadshow Basel Thank you!

  24. Augus Augus t 31, t 31, 2012 2012 24 Masters Cluster

  25. Augus Augus t 31, t 31, 2012 2012 25 Worker Cluster

Recommend


More recommend