apache accumulo
play

Apache Accumulo How can I use Accumulo? Who is involved in the - PowerPoint PPT Presentation

Accumulo Adam Fuchs What is Accumulo? Apache Accumulo How can I use Accumulo? Who is involved in the Accumulo Adam Fuchs community? Where is Accumulo National Security Agency going? Computer and Information Sciences Research Group


  1. Accumulo Adam Fuchs What is Accumulo? Apache Accumulo How can I use Accumulo? Who is involved in the Accumulo Adam Fuchs community? Where is Accumulo National Security Agency going? Computer and Information Sciences Research Group July 17, 2012

  2. Design Drivers Accumulo Adam Fuchs Analysis of big data is central to our customers’ requirements, in which the strongest drivers are: What is Accumulo? Scalability : The ability to do twice the work at only (about) twice the cost. How can I use Adaptability : The ability to rapidly evolve the analytical tools available in Accumulo? an operational environment, building upon and enhancing existing Who is capabilities. involved in the Accumulo From these directives we can derive the following requirements: community? Simplicity in the overall architecture to encourage collaboration and Where is ameliorate learning curve. Accumulo going? Generic design patterns to store and organize data whose format we don’t control. Generic discovery analytics to retrieve and visualize generic data. Solutions for common sub-problems, such as multi-level security and enforcement of legal restrictions, built into the infrastructure.

  3. Optimization Accumulo Adam Fuchs What is ... is a secondary concern, given: Accumulo? hundreds of evolving applications, How can I use Accumulo? hundreds of changing data sources, Who is non-trivial data volumes, involved in the Accumulo many complicated interactions . community? Instead, we need a generic platform that is cheap, simple, scalable, secure, and Where is adaptable , with pretty good performance. Accumulo going?

  4. Growth of Accumulo Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going?

  5. Key/Value Structure Accumulo Adam Fuchs An Accumulo Key is a 5-tuple, including: What is Accumulo? Row : controls Atomicity Column Family : controls Locality How can I use Accumulo? Column Qualifier : controls Uniqueness Who is Visibility : controls Access (unique to Accumulo) involved in the Accumulo Timestamp : controls Versioning community? Where is Accumulo Sample Entries going? Row : Col. Fam. : Col. Qual. : Visibility : Timestamp ⇒ Value Adam : Favorites : Food : (Public) : 20090801 ⇒ Sushi Adam : Favorites : Programming Language : (Private) : 20090830 ⇒ Java Adam : Favorites : Programming Language : (Private) : 20070725 ⇒ C++ Adam : Friends : Bob : (Public) : 20110601 ⇒ Adam : Friends : Joe : (Private) : 20110601 ⇒

  6. Visibility Label Syntax and Semantics Accumulo Adam Fuchs Document Labels User Authorization Sets What is Accumulo? Doc 1 : (Federation) CptKirk : { Federation,Human } How can I use Doc 2 : (Klingon|Vulcan) MrSpock : { Federation,Human,Vulcan } Accumulo? Doc 3 : (Federation & Human & Vulcan) Doc 4 : (Federation & (Human|Vulcan)) Who is involved in the Accumulo Syntax Semantics community? Where is ⇒ [a-zA-Z0-9 ]+ WORD ( T ⇒ τ ) ∧ ( τ ∈ A ) term Accumulo CLAUSE ⇒ AND ( T , A ) | = true going? ⇒ OR AND ⇒ AND & AND ( T ⇒ T 1 & T 2 ) ∧ (( T 1 , A ) | = true) ∧ (( T 2 , A ) | = true) ⇒ ( CLAUSE ) and ( T , A ) | = true ⇒ WORD OR ⇒ OR | OR ( T ⇒ T 1 | T 2 ) ∧ ((( T 1 , A ) | = true) ∨ (( T 2 , A ) | = true)) ⇒ ( CLAUSE ) or ⇒ WORD ( T , A ) | = true ( T ⇒ ( T1 )) ∧ ( T1 | = true) paren ( T , A ) | = true

  7. Tablets Accumulo Adam Fuchs Collections of What is key/value pairs Accumulo? form Tables How can I use Tables are Accumulo? partitioned into Who is Tablets involved in the Accumulo Metadata tablets community? hold info about Where is other tablets, Accumulo forming a going? three-level hierarchy A Tablet is a unit of work for a Tablet Server

  8. Distributed Processes Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going?

  9. Tablet Server Composition Accumulo Adam Fuchs What is Quick and loose definitions: Accumulo? Table : A map of keys to values with one global sort order among keys. How can I use Tablet : A row range within a Table. Accumulo? Tablet Server : The mechanism that hosts Tablets, providing the primary Who is functionality of Bigtable or Accumulo. involved in the Accumulo community? Tablet servers have several primary functions: Where is 1 Hosting RPCs (read, write, etc.) Accumulo going? Managing resources (RAM, CPU, File I/O, etc.) 2 Scheduling background tasks (compactions, caching, etc.) 3 Handling key/value pairs 4 Category 4 is almost entirely accomplished through the Iterator framework .

  10. Tablet Server Data Flow Accumulo Adam Fuchs What is Iterator Uses Accumulo? File Reads How can I use Block Caching Accumulo? Merging Who is Deletion involved in the Accumulo Isolation community? Locality Groups Where is Range Selection Accumulo Column Selection going? Cell-level Security Versioning Filtering Aggregation Partitioned Joins

  11. The Perils of Distributed Computing Accumulo Adam Fuchs What is Accumulo? Dealing with failures is hard! How can I use Operations like table creation are logically atomic, but consist of multiple Accumulo? operations on distributed systems. Who is involved in the Resource locking (via mutex, semaphores, etc.) provides some sanity. Accumulo community? Distributed systems have many complicated failure modes: clients, master, Where is tablet servers, and dependent systems can all go offline periodically. Accumulo going? Who is responsible for unlocking locks when any component can fail? How do we know it’s safe to unlock a lock?

  12. Accumulo Testing Procedures Accumulo Adam Fuchs Testing Frameworks What is Unit : Verify correct functioning of Accumulo? each module separately Other Considerations How can I use System : Perform correctness and Accumulo? Scoping tests to include performance tests on a small Who is server-side code, client-side code, running instance involved in the dependent processes, etc. Accumulo Load/Scale : Generate high loads community? Code coverage vs. path coverage at scale and measure performance Where is Static vs. dynamic analysis and correctness Accumulo going? Simulating failures of distributed Random Walk : Randomly, repeatedly, and concurrently components execute a variety of test modules Strange failure modes (often representative of user activity on hardware/physics-related) an instance at scale Simulation : Evaluate the model to gauge expected performance

  13. Fault-Tolerant Executor Accumulo Adam Fuchs What is Accumulo? If a process dies, previously submitted operations continue How can I use Accumulo? to execute on restart. Who is involved in the FATE serializes every task in Zookeeper before execution. Accumulo community? The Master process uses FATE to execute table operations Where is and administrative actions. Accumulo going? FATE eliminates the single point of failure.

  14. Verified State Models Accumulo Adam Fuchs State models used for What is many internal functions Accumulo? Explicit-state model How can I use checking proves Accumulo? correctness Who is involved in the Accumulo community? Where is Accumulo going?

  15. Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going?

  16. Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going?

  17. Event Table with Inverted Index Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going?

  18. Inverted Index Flow Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going?

  19. Multidimensional Index Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going? See also: http://en.wikipedia.org/wiki/Geohash

  20. Graph Table Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going?

  21. The “shard” Table Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going?

  22. Committers, Contributors, and Community Accumulo Accumulo-Related Companies Adam Fuchs 42six Accumulo Data What is Accumulo? Berico Booz Allen Hamilton How can I use Accumulo? CyberPoint Data Tactics Who is involved in the Eclectic Consulting Accumulo Invertix community? KEYW Where is PDI Accumulo going? Peterson Technologies Potomac Fusion Praxis SAIC sqrrl SRA SW Complete Tetra Concepts TexelTek Your name here!

  23. User Base Accumulo Adam Fuchs What is Accumulo? How can I use Accumulo? Who is involved in the Accumulo community? Where is Accumulo going?

  24. Features in the Pipeline Accumulo Adam Fuchs What is Accumulo? Block stats indexing How can I use Transient block indexing Accumulo? Who is Pluggable Authentication and Authorization involved in the Accumulo community? HDFS-based write-ahead log Where is Multiple namenode/volume support Accumulo going? Integration with cluster management systems Web-integrated shell

Recommend


More recommend