review summary of the performance of symbol table
play

Review: summary of the performance of symbol-table implementations - PowerPoint PPT Presentation

BBM 202 - ALGORITHMS D EPT . OF C OMPUTER E NGINEERING E RKUT E RDEM T RIES Apr. 21, 2015 Acknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick


  1. 
 
 BBM 202 - ALGORITHMS D EPT . OF C OMPUTER E NGINEERING E RKUT E RDEM T RIES Apr. 21, 2015 Acknowledgement: ¡ The ¡course ¡slides ¡are ¡adapted ¡from ¡the ¡slides ¡prepared ¡by ¡R. ¡Sedgewick ¡ 
 and ¡K. ¡Wayne ¡of ¡Princeton ¡University.

  2. T ODAY ‣ Tries ‣ R-way tries ‣ Ternary search tries ‣ Character-based operations

  3. 
 
 
 
 
 
 
 
 
 
 
 
 Review: summary of the performance of symbol-table implementations Order of growth of the frequency of operations. typical case ordered operations implementation operations on keys search insert delete log N log N red-black BST log N yes compareTo() equals() hash table 1 † 1 † 1 † no hashcode() • † under uniform hashing assumption Q. Can we do better? A. Yes, if we can avoid examining the entire key, as with string sorting. 3

  4. 
 
 
 
 
 
 
 
 
 
 
 
 
 String symbol table basic API String symbol table. Symbol table specialized to string keys. public class StringST<Value> StringST() create an empty symbol table void put(String key, Value val) put key-value pair into the symbol table Value get(String key) return value paired with given key void delete(String key) delete key and corresponding value ⋮ Goal. Faster than hashing, more flexible than BSTs. 4

  5. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 String symbol table implementations cost summary character accesses (typical case) dedup search 
 search 
 space 
 implementation insert moby.txt actors.txt hit miss (references) red-black BST L + c lg 2 N c lg 2 N c lg 2 N 4N 1,4 97,4 hashing L L L 4N to 16N 0,76 40,6 (linear probing) Parameters file size words distinct • N = number of strings moby.txt 1.2 MB 210 K 32 K • L = length of string actors.txt 82 MB 11.4 M 900 K • R = radix Challenge. Efficient performance for string keys. 5

  6. T RIES ‣ R-way tries ‣ Ternary search tries ‣ Character-based operations

  7. Tries Tries. [from retrieval, but pronounced "try"] • Store characters in nodes (not keys). for now, we do not 
 draw null links • Each node has R children, one for each possible character. • Store values in nodes corresponding to last characters in keys. link to trie for all keys root that start with s link to trie for all keys that start with she b s t y e h h 4 key value o by 4 a l e e 0 5 6 sea 6 value for she in node r sells 1 l l corresponding to last she 0 key character e shells 3 s l 7 1 shore 7 the 5 s 3 7

  8. Search in a trie Follow links corresponding to each character in the key. • Search hit: node where search ends has a non-null value. • Search miss: reach a null link or node where search ends has null value. get("shells") b s t y e h h h 4 o a l e e 0 5 6 r l l e s l 7 1 return value associated s 3 3 with last key character 8 (return 3)

  9. Search in a trie Follow links corresponding to each character in the key. • Search hit: node where search ends has a non-null value. • Search miss: reach a null link or node where search ends has null value. get("she") b s t y e h h h 4 o a l e e 0 0 5 6 search may terminated r l l at an intermediate node (return 0) e s l 7 1 s 3 9

  10. Search in a trie Follow links corresponding to each character in the key. • Search hit: node where search ends has a non-null value. • Search miss: reach a null link or node where search ends has null value. get("shell") b s t y e h h h 4 o a l e e 0 5 6 r l l e s l 7 1 no value associated with last key character s 3 (return null) 10

  11. Search in a trie Follow links corresponding to each character in the key. • Search hit: node where search ends has a non-null value. • Search miss: reach a null link or node where search ends has null value. get("shelter") b s t y e h h h 4 o a l e e 0 5 6 r l l e s l 7 1 no link to 't' s 3 (return null) 11

  12. Insertion into a trie Follow links corresponding to each character in the key. • Encounter a null link: create new node. • Encounter the last character of the key: set value in that node. put("shore", 7) b s t y e h h h 4 o a l e e 0 5 6 r l l e s l 7 1 s 3 12

  13. Trie construction demo trie 13

  14. Trie construction demo put("she", 0) s h e 0 key is sequence value is in node of characters from corresponding to root to value last character 14

  15. Trie construction demo trie s h e 0 15

  16. Trie construction demo trie s h e 0 16

  17. Trie construction demo put("sells", 1) s h e l e 0 l s 1 17

  18. Trie construction demo trie s h e l e 0 l s 1 18

  19. Trie construction demo trie s h e l e 0 l s 1 19

  20. Trie construction demo put("sea", 2) s h e a l e 0 2 l s 1 20

  21. Trie construction demo trie s h e a l e 0 2 l s 1 21

  22. Trie construction demo put("shells", 3) s h e a l e 0 2 l l s l 1 s 3 22

  23. Trie construction demo trie s h e a l e 0 2 l l s l 1 s 3 23

  24. Trie construction demo put("by", 4) b s h y e 4 a l e 0 2 l l s l 1 s 3 24

  25. Trie construction demo trie b s h y e 4 a l e 0 2 l l s l 1 s 3 25

  26. Trie construction demo put("the", 5) b s t h h y e 4 a l e e 0 5 2 l l s l 1 s 3 26

  27. Trie construction demo trie b s t h h y e 4 a l e e 0 5 2 l l s l 1 s 3 27

  28. Trie construction demo put("sea", 6) b s t h h y e 4 a a l e e 0 5 2 6 l l overwrite old value with s l 1 new value s 3 28

  29. Trie construction demo trie b s t h h y e 4 a l e e 0 5 6 l l s l 1 s 3 29

  30. Trie construction demo trie b s t h h y e 4 a l e e 0 5 6 l l s l 1 s 3 30

  31. Trie construction demo put("shore", 7) b s t h h y e 4 o a l e e 0 5 6 r l l e s l 7 1 s 3 31

  32. Trie construction demo trie b s t h h y e 4 o a l e e 0 5 6 r l l e s l 7 1 s 3 32

  33. Trie representation: Java implementation Node. A value, plus references to R nodes. private static class Node { use Object instead of Value since 
 private Object value; no generic array creation in Java private Node[] next = new Node[R]; } neither keys nor characters are implicitly characters are s defined by link index s explicitly stored e h e h e a l a e 0 l 2 0 2 l each node has l an array of links s and a value s 1 1 Trie representation 33

  34. R-way trie: Java implementation public class TrieST<Value> { private static final int R = 256; extended ASCII private Node root; private static class Node { /* see previous slide */ } public void put(String key, Value val) { root = put(root, key, val, 0); } private Node put(Node x, String key, Value val, int d) { if (x == null) x = new Node(); if (d == key.length()) { x.val = val; return x; } char c = key.charAt(d); x.next[c] = put(x.next[c], key, val, d+1); return x; } ⋮ 34

  35. R-way trie: Java implementation (continued) ⋮ public boolean contains(String key) { return get(key) != null; } public Value get(String key) { Node x = get(root, key, 0); if (x == null) return null; return (Value) x.val; cast needed } private Node get(Node x, String key, int d) { if (x == null) return null; if (d == key.length()) return x; char c = key.charAt(d); return get(x.next[c], key, d+1); } } 35

  36. 
 
 
 
 
 Trie performance Search hit. Need to examine all L characters for equality. Search miss. • Could have mismatch on first character. • Typical case: examine only a few characters (sublinear). Space. R null links at each leaf. 
 (but sublinear space possible if many short strings share common prefixes) Bottom line. Fast search hit and even faster search miss, but wastes space. 36

  37. Deletion in an R-way trie To delete a key-value pair: • Find the node corresponding to key and set value to null. • If that node has all null links, remove that node (and recur). delete("shells") b s s t y e h h 4 o a l e e 0 5 6 r l l e s l 7 1 null value and links s set value to null 3 (delete node) 37

  38. String symbol table implementations cost summary character accesses (typical case) dedup search 
 search 
 space 
 implementation insert moby.txt actors.txt hit miss (references) red-black BST L + c lg 2 N c lg 2 N c lg 2 N 4N 1,4 97,4 hashing L L L 4N to 16N 0,76 40,6 (linear probing) out of log R N R-way trie L L (R+1) N 1,12 memory R-way trie. • Method of choice for small R . • Too much memory for large R . Challenge. Use less memory, e.g., 65,536 -way trie for Unicode! 38

Recommend


More recommend