Accumulo Adam Fuchs The Basics Types of Indexing Accumulo Table Design Iterators Can Help Information Retrieval Adam Fuchs Wikisearch Example Conclusions May 9, 2012
Key/Value Structure Accumulo Adam Fuchs An Accumulo Key is a 5-tuple, including: The Basics Row : controls Atomicity Types of Column Family : controls Locality Indexing Iterators Can Column Qualifier : controls Uniqueness Help Visibility : controls Access (unique to Accumulo) Information Timestamp : controls Versioning Retrieval Wikisearch Example Sample Entries Conclusions Row : Col. Fam. : Col. Qual. : Visibility : Timestamp ⇒ Value Adam : Favorites : Food : (Public) : 20090801 ⇒ Sushi Adam : Favorites : Programming Language : (Private) : 20090830 ⇒ Java Adam : Favorites : Programming Language : (Private) : 20070725 ⇒ C++ Adam : Friends : Bob : (Public) : 20110601 ⇒ Adam : Friends : Joe : (Private) : 20110601 ⇒
Client Mechanisms Accumulo Adam Fuchs The Basics Types of Indexing BatchWriter: Group mutations and apply across the Iterators Can cluster in batches. Help Scanner: Define a range of keys and scan sequentially Information Retrieval through them. Wikisearch Example BatchScanner: Execute a scan over multiple ranges in Conclusions parallel.
Relation Roots Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Relation Roots Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Event Table with Inverted Index Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Inverted Index Flow Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Type-agnostic Indexing Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Order-preserving Encodings Accumulo Adam Fuchs The Basics Bag-O-Tricks Fixed precision exponent, Types of Subtract byte from 255 Indexing unbounded precision significand (or digit from 9) when Iterators Can floating point encoding: negative Help -1.23 E+45 ⇒ --54 876 Flip signs of exponents for Information Retrieval negative numbers Variable precision integer: Wikisearch Re-order bytes based on 12345 ⇒ 11111012345 Example importance Conclusions Tuple Encoding (no binary zero Prefix with magnitude or use fixed-precision in elements): Unary encoding of foo,bar ⇒ foo \ 0 bar magnitudes
Multidimensional Index Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions See also: http://en.wikipedia.org/wiki/Geohash
Graph Table Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Tablet Server Composition Accumulo Adam Fuchs The Basics Quick and loose definitions: Types of Table : A map of keys to values with one global sort order among keys. Indexing Tablet : A row range within a Table. Iterators Can Tablet Server : The mechanism that hosts Tablets, providing the primary Help functionality of Bigtable or Accumulo. Information Retrieval Tablet servers have several primary functions: Wikisearch 1 Hosting RPCs (read, write, etc.) Example Managing resources (RAM, CPU, File I/O, etc.) 2 Conclusions Scheduling background tasks (compactions, caching, etc.) 3 Handling key/value pairs 4 Category 4 is almost entirely accomplished through the Iterator framework .
Tablet Server Data Flow Accumulo Adam Fuchs The Basics Iterator Uses Types of File Reads Indexing Block Caching Iterators Can Merging Help Deletion Information Isolation Retrieval Locality Groups Wikisearch Range Selection Example Column Selection Conclusions Cell-level Security Versioning Filtering Aggregation Partitioned Joins
Iterators Accumulo Adam Fuchs An Iterator is an object that provides an ordered stream of entries (key/value pairs), and The Basics supports basic selection and filtering methods. Types of Indexing Core Iterators provide a basic view of a tablet’s entries, implementing: Iterators Can Help File Reads Block Caching Information Merging Retrieval Deletion Isolation Wikisearch Locality Groups Example Range Selection Column Selection Conclusions Cell-level Security Application-level Iterators modify table semantics to provide custom views, persisted or otherwise: Versioning Filtering Aggregation Partitioned Joins
Aggregation Accumulo Input Key/Value Pairs: Adam Fuchs Row Column Value alone Doc 2 1 The Basics and Doc 1 1 are Doc 1 1 Types of Goals: Count the number of times a word appears in a at Doc 3 1 Indexing bar 1 Doc 1 dynamic corpus, and count the number of documents bar 1 Doc 2 Iterators Can that contain a given word. bar Doc 3 1 Help bar Doc 4 1 cannot Doc 2 1 Sample Corpus Information common Doc 1 1 Retrieval foo Doc 1 1 foo 1 Doc 4 Wikisearch food Doc 2 1 Example fool Doc 3 1 Doc 1 : "foo and bar are common variable names" invent Doc 4 1 kung Doc 4 1 Conclusions Doc 2 : "one cannot live on bar food alone" live Doc 2 1 Mr.T 1 Doc 3 Doc 3 : "Mr.T pities the fool at the bar" names 1 Doc 1 on Doc 2 1 Doc 4 : "someone should invent the kung foo bar" one Doc 2 1 should Doc 4 1 someone Doc 4 1 pities 1 Doc 3 the 1 Doc 3 the Doc 3 1 the Doc 4 1 variable Doc 1 1
A Simple Aggregator Accumulo Adam Fuchs The Basics Types of Aggregators replace the Indexing “versioning” functionality of a table Iterators Can Any associative, commutative Help operations on the values for a given key can be encoded in an Information aggregator Retrieval Aggregators can persist an Wikisearch aggregation of the entries written Example to the table Aggregators are significantly more Conclusions efficient than a read-modify-write loop due to “lazy” aggregation
Accumulo vs. HBase Atomic Increment Accumulo Adam Fuchs The Basics HBase performs a server-side upsert (read-modify-write), Types of Indexing taking advantage of previous value being resident in Iterators Can write-cache Help Information Accumulo buffers inserts and aggregates lazily but Retrieval consistently, taking advantage of merge-tree data streams Wikisearch Example Both methods implement the same atomic increment Conclusions semantics Performance varies wildly...
Increment Performance Comparison Accumulo Adam Fuchs Write Performance Read Performance The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions Aggregator wins for write performance with many different keys Upsert wins for read performance with a small number of keys Can we use both approaches?
Multi-Term Query with Document Partitioning Accumulo Adam Fuchs The Basics Types of Goal: Find all of the documents that contain Indexing Iterators Can the words “foo” and “bar”. Help Information Partitioned Corpus Retrieval Doc 1 : "foo and bar are common variable names" Wikisearch Doc 2 : Partition 1 "one cannot live on bar food alone" Example Doc 3 : "Mr.T pities the fool at the bar" Conclusions � Doc 4 : Partition 2 "someone should invent the kung foo bar"
Document Partitioning Accumulo Adam Fuchs Divide and Conquer: The Basics Row ColFam ColQual Types of Part 1 alone Doc 2 Indexing and Part 1 Doc 1 are Part 1 Doc 1 Iterators Can at Part 1 Doc 3 Help bar Part 1 Doc 1 Row ColFam ColQual bar Part 1 Doc 2 Information bar Part 2 bar Doc 4 Part 1 Doc 3 Retrieval cannot Part 2 foo Doc 4 Part 1 Doc 2 common Part 2 invent Doc 4 Wikisearch Part 1 Doc 1 Part 1 foo Doc 1 Part 2 kung Doc 4 Example Part 1 food Doc 2 Part 2 should Doc 4 Conclusions Part 1 fool Doc 3 Part 2 someone Doc 4 Part 1 live Doc 2 Part 2 the Doc 4 Part 1 Mr.T Doc 3 Part 1 names Doc 1 Part 1 on Doc 2 Part 1 one Doc 2 Part 1 pities Doc 3 Part 1 the Doc 3 Part 1 variable Doc 1
Partitioned Join Iterator Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
The “shard” Table Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Document Partitioned Flow Accumulo Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Query == Iterator Tree Accumulo foo AND (bar OR baz) Adam Fuchs The Basics Types of Indexing Iterators Can Help Information Retrieval Wikisearch Example Conclusions
Recommend
More recommend