unified benchmarking of big data platforms the hobbit
play

Unified Benchmarking of Big Data Platforms The HOBBIT Platform - PowerPoint PPT Presentation

Unified Benchmarking of Big Data Platforms The HOBBIT Platform Axel-Cyrille Ngonga Ngomo Horizon 2020 GA No 688227 01/12/201630/11/2018 Apache Big Data Sevilla, Spain November 11, 2016 Ngonga Ngomo (InfAI) Benchmarking Big Data


  1. Unified Benchmarking of Big Data Platforms The HOBBIT Platform Axel-Cyrille Ngonga Ngomo Horizon 2020 GA No 688227 01/12/2016–30/11/2018 Apache Big Data Sevilla, Spain November 11, 2016 Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 1 / 42

  2. A Lot of Data 1 1 http://www.ibmbigdatahub.com/infographic/four-vs-big-data Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 2 / 42

  3. A Lot of Tools 2 2 https://cloudramblings.me/ Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 3 / 42

  4. A Lot ... of Tools Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 4 / 42

  5. A Lot of Views 4 4 https://steemit.com/philosophy/@l0k1/ subjectivity-and-truth-how-blockchains-model-consensus-building Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 5 / 42

  6. Core Questions Developers: How good is my tool? Vendors: Who is my tool good for? Users: Which tool(s) should I use for my application? Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 6 / 42

  7. Many Questions Where are the current bottlenecks? Which steps of the data lifecycle are critical? Which solutions are available? Which key performance indicators are relevant? How well do or should tools perform? How do existing solutions perform w.r.t. relevant indicators? Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 7 / 42

  8. Solution Benchmark Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 8 / 42

  9. Solution Benchmark Components Dataset(s), e.g., Twitter stream, sensor data Task(s), i.e., NER, NEL, ingestion Key Performance Indicators, e.g., precision, recall Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 8 / 42

  10. Challenges Dataset Mismatch 2 7 ? 1 7 1 e l k k k ? b s s s e a s s a a a l u b l u t t t i 4 0 p a a p 1 3 0 3 7 7 r l v L r i o 0 a n b 3 o 1 0 0 a L c 2 o m t C 0 0 0 v N e o s T i 2 2 2 3 a t d t c o 0 a n c t - - - - s i N s e e C C 5 n h l l l l o v e a a a a s i r I D A g p r B / l v v v v a a A l E e i A o - - E E E e b o w s N R l i U B c i i t r E j m m m s - r b k D k k o t a S i O n F c C Q T e f e i N 3 i i p i e W I W W e e e e I o W M M M Y A A I A K N S S S S S S I Cucerzan 2007 ✓ Wikipedia 2008 ✓ * ✓ Miner Illinois Wikifier 2011 ✓ ✓ ✓ * ✓ ✓ Spotlight 2011 ✓ ✓ ✓ AIDA 2011 ✓ ✓ ✓ ** TagMe 2 2012 ✓ ✓ ✓ ✓ Dexter 2013 ✓ ✓ KEA 2013 ✓ WAT 2013 ✓ ✓ AGDISTIS 2014 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Babelfy 2014 ✓ ✓ ✓ ✓ ✓ ✓ ✓ NERD-ML 2014 ✓ ✓ ✓ ✓ BAT- 2013 ✓ ✓ ✓ ✓ ✓ ✓ ✓ * ✓ Framework NERD 2014 ✓ ✓ ✓ ✓ ✓ Framework GERBIL 2014 ✓ * ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 9 / 42

  11. Challenges Unclear KPI Semantics Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 10 / 42

  12. Challenges Unclear KPI Semantics Example Which time do we measure? First or last result? With or without network delay? Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 10 / 42

  13. Challenges Unclear KPI Semantics Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 11 / 42

  14. Challenges Unclear KPI Semantics Example When is an annotation correct? Weak or strong annotation? Semantically equivalent or exact URI? Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 11 / 42

  15. Solution Unified Benchmarking Framework Annotator ... Your Wrapper Annotator GERBIL Interface View Core Controller Web service calls Configuration (Model) Benchmark Core Benchmark Core Persistent Experiment Database Interface View (Model) Web service calls Dataset Wrapper Your Dataset DataHub.io Open Datasets Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 12 / 42

  16. GERBIL Overview Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 13 / 42

  17. GERBIL Overview Evaluation platform for NER/NEL 18 reference annotation systems 32 reference datasets Benchmarking 10 × faster Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 13 / 42

  18. GERBIL Overview Evaluation platform for NER/NEL 18 reference annotation systems 32 reference datasets Benchmarking 10 × faster Archiving of results Citeable URIs Additional analysis Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 13 / 42

  19. GERBIL Overview Evaluation platform for NER/NEL 18 reference annotation systems 32 reference datasets Benchmarking 10 × faster Archiving of results Citeable URIs Additional analysis Open-source project Local deployment Normalized implementation of KPIs Online instance Feedback for developers and users Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 13 / 42

  20. GERBIL Annotator Tasks NIF-based Annotators 2519 Babelfy 958 DBpedia Spotlight 922 TagMe 2 811 WAT 787 Kea 763 Wikipedia Miner 714 NERD-ML 639 Dexter 587 AGDISTIS 443 Entityclassifier.eu NER 410 FOX 352 Cetus 1 Overall 24.3K exps 50+ papers Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 14 / 42

  21. HOBBIT Rationale A community-driven benchmarking framework for the community Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 15 / 42

  22. HOBBIT Rationale A community-driven benchmarking framework for the community Focus on Big (Linked) Data Build upon 24.3K experiments performed with GERBIL Cover all steps of the Linked Data lifecycle Used by a growing number of companies Mature and maturing technologies Open benchmarks based on industrial data and use cases Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 15 / 42

  23. Aims 1 Gather real requirements Performance indicators Performance thresholds 2 Develop benchmarks based on real data 3 Provide universal benchmarking platform Standardized hardware Comparable results 4 Periodic benchmarking challenges 5 Periodic reporting 6 Found independent Hobbit association Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 16 / 42

  24. Overview Participants/Community Data Collection Benchmark Creation Challenges Solution 1 Benchmark 1 Industry data HOBBIT Solution 2 Platform Benchmark 2 Measure Collection KPIs KPIs KPIs KPIs Tasks KPIs Tasks KPIs Tasks Tasks Solution k Tasks Tasks Benchmark n Reports Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 17 / 42

  25. Survey Questions In what areas are organizations active? What do people expect from benchmarks? How are benchmarks being used? Profile Count Solution providers 56 Technology users 67 Scientific community 65 Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 18 / 42

  26. Survey Can your solution be benchmarked? Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 19 / 42

  27. Survey Do you benchmark your solution? Own datasets and settings in many cases Own implementations of measures Results not comparable Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 20 / 42

  28. Survey Application Areas Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 21 / 42

  29. HOBBIT Platform Features Uses established deployment technologies (Docker) Decoupled components Benchmark and Systems can be written in different languages Uses scalable message queues for communication Open-source implementation Supports distributed benchmarks and systems Online instance on server cluster Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 22 / 42

  30. HOBBIT Benchmarks Features Addresses all steps of the Linked Data Lifecycle Benchmarks derived from industry use cases Real data under the bechmarks Scalable size of benchmarks Open-source implementation Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 23 / 42

  31. HOBBIT Platform Benchmarks Streaming and static deterministic benchmarks Realistic benchmarks Controlled volume and velocity Analysis and Processing Generation and Acquisition Link Discovery Conversion of XML into RDF Machine Learning Entity recognition and linking Supervised and unsupervised Relation extraction Storage and Curation Visualization and Services Triple stores Question Answering Versioning Faceted Browsing Incl. updates Usage-based benchmarks Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 24 / 42

  32. HOBBIT Platform Architecture Evaluation Benchmark Platform Front End Module Controller Controller Task Data Task Data Task Data Analysis Generator Generator Generator Generator Generator Generator Storage Eval. Storage Benchmarked System Logging data flow creates component Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 25 / 42

  33. HOBBIT Platform Benchmark Initialization Benchmark Platform Controller Controller Task Data Task Data Task Data Generator Generator Generator Generator Generator Generator Storage Eval. Storage Benchmarked System data flow creates component Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 26 / 42

Recommend


More recommend