accumulo extensions to google s bigtable
play

Accumulo Extensions to Googles Bigtable Apache Accumulo Design - PowerPoint PPT Presentation

Accumulo Adam Fuchs Design Drivers Accumulo Extensions to Googles Bigtable Apache Accumulo Design Intro to Bigtable Iterators FATE Major Compaction Design Adam Fuchs Patterns F` n National Security Agency Computer and


  1. Accumulo Adam Fuchs Design Drivers Accumulo – Extensions to Google’s Bigtable Apache Accumulo Design Intro to Bigtable Iterators FATE Major Compaction Design Adam Fuchs Patterns F` ın National Security Agency Computer and Information Sciences Research Group March 29, 2012

  2. Contents Accumulo Adam Fuchs Design Drivers 1 Design Drivers Apache Accumulo Apache Accumulo 2 Intro to Bigtable Iterators Intro to Bigtable FATE Major Iterators Compaction FATE Design Patterns Major Compaction F` ın Design Patterns 3 F` ın 4

  3. Progress Accumulo Adam Fuchs Design Drivers 1 Design Drivers Apache Accumulo Apache Accumulo 2 Intro to Bigtable Iterators Intro to Bigtable FATE Major Iterators Compaction FATE Design Patterns Major Compaction F` ın Design Patterns 3 F` ın 4

  4. Design Drivers Accumulo Adam Fuchs Analysis of big data is central to our customers’ requirements, in which the strongest drivers are: Design Drivers Scalability : The ability to do twice the work at only (about) twice the cost. Apache Accumulo Adaptability : The ability to rapidly evolve the analytical tools available in Intro to Bigtable an operational environment, building upon and enhancing existing Iterators FATE capabilities. Major Compaction From these directives we can derive the following requirements: Design Simplicity in the overall architecture to encourage collaboration and Patterns ameliorate learning curve. F` ın Generic design patterns to store and organize data whose format we don’t control. Generic discovery analytics to retrieve and visualize generic data. Solutions for common sub-problems, such as multi-level security and enforcement of legal restrictions, built into the infrastructure.

  5. Optimization Accumulo Adam Fuchs Design Drivers ... is a secondary concern, given: Apache hundreds of evolving applications, Accumulo Intro to Bigtable hundreds of changing data sources, Iterators non-trivial data volumes, FATE Major Compaction many complicated interactions . Design Instead, we need a generic platform that is cheap, simple, scalable, secure, and Patterns adaptable , with pretty good performance. F` ın

  6. Progress Accumulo Adam Fuchs Design Drivers 1 Design Drivers Apache Accumulo Apache Accumulo 2 Intro to Bigtable Iterators Intro to Bigtable FATE Major Iterators Compaction FATE Design Patterns Major Compaction F` ın Design Patterns 3 F` ın 4

  7. Apache Accumulo Accumulo Adam Fuchs Design Drivers Apache Accumulo Intro to Bigtable Iterators First code written in Spring of 2008 FATE Major Open-sourced as an Apache Software Foundation incubator podling in Compaction September, 2011 Design Patterns Graduated to Top-Level Project in March, 2012 F` ın Mostly a clone of Bigtable, but includes several notable features: Iterators: a framework for processing sorted streams of key/value entries Cell-level Security: mandatory, attribute-based access control with key/value granularity Fault-Tolerant Execution Framework (FATE) A compaction scheduler with nice properties

  8. Progress Accumulo Adam Fuchs Design Drivers 1 Design Drivers Apache Accumulo Apache Accumulo 2 Intro to Bigtable Iterators Intro to Bigtable FATE Major Iterators Compaction FATE Design Patterns Major Compaction F` ın Design Patterns 3 F` ın 4

  9. Basic Data Type Accumulo Adam Fuchs An Accumulo Key is a 5-tuple, including: Design Drivers Row : controls Atomicity Apache Column Family : controls Locality Accumulo Intro to Bigtable Column Qualifier : controls Uniqueness Iterators FATE Visibility : controls Access (unique to Accumulo) Major Compaction Timestamp : controls Versioning Design Patterns Sample Entries F` ın Row : Col. Fam. : Col. Qual. : Visibility : Timestamp ⇒ Value Adam : Favorites : Food : (Public) : 20090801 ⇒ Sushi Adam : Favorites : Programming Language : (Private) : 20090830 ⇒ Java Adam : Favorites : Programming Language : (Private) : 20070725 ⇒ C++ Adam : Friends : Bob : (Public) : 20110601 ⇒ Adam : Friends : Joe : (Private) : 20110601 ⇒

  10. Tablets Accumulo Adam Fuchs Collections of Design Drivers key/value pairs Apache form Tables Accumulo Tables are Intro to Bigtable Iterators partitioned into FATE Tablets Major Compaction Metadata tablets Design hold info about Patterns other tablets, F` ın forming a three-level hierarchy A Tablet is a unit of work for a Tablet Server

  11. Distributed Processes Accumulo Adam Fuchs Design Drivers Apache Accumulo Intro to Bigtable Iterators FATE Major Compaction Design Patterns F` ın

  12. Progress Accumulo Adam Fuchs Design Drivers 1 Design Drivers Apache Accumulo Apache Accumulo 2 Intro to Bigtable Iterators Intro to Bigtable FATE Major Iterators Compaction FATE Design Patterns Major Compaction F` ın Design Patterns 3 F` ın 4

  13. Tablet Server Composition Accumulo Adam Fuchs Design Drivers Quick and loose definitions: Apache Table : A map of keys to values with one global sort order among keys. Accumulo Tablet : A row range within a Table. Intro to Bigtable Iterators Tablet Server : The mechanism that hosts Tablets, providing the primary FATE functionality of Bigtable or Accumulo. Major Compaction Tablet servers have several primary functions: Design Patterns 1 Hosting RPCs (read, write, etc.) F` ın Managing resources (RAM, CPU, File I/O, etc.) 2 Scheduling background tasks (compactions, caching, etc.) 3 Handling key/value pairs 4 Category 4 is almost entirely accomplished through the Iterator framework .

  14. Tablet Server Data Flow Accumulo Adam Fuchs Design Drivers Iterator Uses Apache File Reads Accumulo Block Caching Intro to Bigtable Merging Iterators FATE Deletion Major Compaction Isolation Locality Groups Design Patterns Range Selection F` ın Column Selection Cell-level Security Versioning Filtering Aggregation Partitioned Joins

  15. Iterators Accumulo Adam Fuchs An Iterator is an object that provides an ordered stream of entries (key/value pairs), and Design Drivers supports basic selection and filtering methods. Apache Accumulo Core Iterators provide a basic view Intro to Bigtable of a tablet’s entries, implementing: Iterators File Reads FATE Block Caching Major Compaction Merging Deletion Design Isolation Patterns Locality Groups Range Selection F` ın Column Selection Cell-level Security Application-level Iterators modify table semantics to provide custom views, persisted or otherwise: Versioning Filtering Aggregation Partitioned Joins

  16. Modified Key/Value Pair Definition Accumulo Adam Fuchs An Accumulo Key is a 5-tuple, including: Design Drivers Row : controls Atomicity Apache Column Family : controls Locality Accumulo Intro to Bigtable Column Qualifier : controls Uniqueness Iterators FATE Visibility : controls Access (unique to Accumulo) Major Compaction Timestamp : controls Versioning Design Patterns Sample Entries F` ın Row : Col. Fam. : Col. Qual. : Visibility : Timestamp ⇒ Value Adam : Favorites : Food : (Public) : 20090801 ⇒ Sushi Adam : Favorites : Programming Language : (Private) : 20090830 ⇒ Java Adam : Favorites : Programming Language : (Private) : 20070725 ⇒ C++ Adam : Friends : Bob : (Public) : 20110601 ⇒ Adam : Friends : Joe : (Private) : 20110601 ⇒

  17. Visibility Label Syntax and Semantics Accumulo Adam Fuchs Document Labels User Authorization Sets Design Drivers Apache Doc 1 : (Federation) CptKirk : { Federation,Human } Accumulo Doc 2 : (Klingon|Vulcan) MrSpock : { Federation,Human,Vulcan } Doc 3 : (Federation & Human & Vulcan) Intro to Bigtable Doc 4 : (Federation & (Human|Vulcan)) Iterators FATE Major Compaction Syntax Semantics Design Patterns ⇒ [a-zA-Z0-9 ]+ WORD ( T ⇒ τ ) ∧ ( τ ∈ A ) term CLAUSE ⇒ AND F` ın ( T , A ) | = true ⇒ OR AND ⇒ AND & AND ( T ⇒ T 1 & T 2 ) ∧ (( T 1 , A ) | = true) ∧ (( T 2 , A ) | = true) ⇒ ( CLAUSE ) and ( T , A ) | = true ⇒ WORD OR ⇒ OR | OR ( T ⇒ T 1 | T 2 ) ∧ ((( T 1 , A ) | = true) ∨ (( T 2 , A ) | = true)) ⇒ ( CLAUSE ) or ⇒ WORD ( T , A ) | = true ( T ⇒ ( T1 )) ∧ ( T1 | = true) paren ( T , A ) | = true

  18. Cell-Level Security Iterator Accumulo Adam Fuchs Design Drivers Apache Accumulo Intro to Bigtable Iterators FATE Major Compaction Design Patterns F` ın

  19. Aggregation Accumulo Input Key/Value Pairs: Adam Fuchs Row Column Value alone Doc 2 1 Design Drivers and Doc 1 1 are Doc 1 1 Apache Goals: Count the number of times a word appears in a at Doc 3 1 Accumulo bar Doc 1 1 dynamic corpus, and count the number of documents Intro to Bigtable bar 1 Doc 2 that contain a given word. Iterators bar Doc 3 1 bar Doc 4 1 FATE cannot Doc 2 1 Major Sample Corpus Compaction common Doc 1 1 foo Doc 1 1 Design foo 1 Doc 4 Patterns food Doc 2 1 fool Doc 3 1 Doc 1 : "foo and bar are common variable names" F` ın invent Doc 4 1 kung Doc 4 1 Doc 2 : "one cannot live on bar food alone" live Doc 2 1 Mr.T 1 Doc 3 Doc 3 : "Mr.T pities the fool at the bar" names 1 Doc 1 on Doc 2 1 Doc 4 : "someone should invent the kung foo bar" one Doc 2 1 should Doc 4 1 someone Doc 4 1 pities 1 Doc 3 the 1 Doc 3 the Doc 3 1 the Doc 4 1 variable Doc 1 1

Recommend


More recommend