working with hash tables
play

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With - PowerPoint PPT Presentation

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS 3+ years Programming 6 years ANZ Data Scientist Combining Data Combining tables Base table Lookup table(s) Want


  1. Working with Hash Tables Daniel Petrolito (ANZ Bank)

  2. Working With Hash Tables • Daniel • SAS 3+ years • Programming 6 years • ANZ – Data Scientist

  3. Combining Data • Combining tables • Base table • Lookup table(s) • Want flexibility without too much complexity • Options • Data Step ✓ Flexible ❖ Must pre-sort data • SQL ✓ In-database ❖ Cartesian joins • Others ✓ Macros ✓ Arrays ✓ User-defined formats ✓ Hash tables

  4. Combining Data – techniques You may have tried the following: • Index the fields you are joining on • Reduce number of rows and columns as early in your process as possible • Do joins where the data originates • SAS dataset compression (compress=yes) • Column lengths • Hash Tables

  5. Hash Tables - Introduction • Optimized for speed through complex hashing algorithms • Think of it as a SAS indexed data table loaded in memory

  6. Benefits & Consequences of Hash Tables Pros Cons • In-memory, faster lookups than • Object-oriented syntax - not any other technique intuitive to code • Can join multiple tables, using • Large memory requirement different keys, in a single data • Data step only step • No sorting required

  7. Simple Example Once only, define & populate the hash table People (Base Table) Reference (Lookup Table) name age Name age state John 41 John 50VIC Sally 33 Sally 35NSW Rebecca 22 Hashtest (Results) name age state rc John 50VIC 0 Sally 35NSW 0 For each record in Lookup failed – Rebecca 22 160038 the base table, look program accordingly up the hash table

  8. Hash Table Methods • Find() • Add() • Replace() • Iterate() • Output()

  9. When Not To Use Hash Tables • Speed is not required • Team not technical savvy • Joins are more optimized to be done from external databases/sources • Not used to replace every left join • Look up table is too big (memory constraint)

  10. Real World Example • Build a fast SAS solution working with millions of rows of data that would join on multiple tables and calculate complicated metrics. • Run thousands of times (Monte Carlo simulation) Issues • First SQL solution was too slow • Dynamic & complex joins

  11. Real World Example – Solution with Hash Tables • Hash tables flexible to only fetch data when needed. • Dynamic look up keys: • hh.find(key:some_variable) • Ranged look ups: • hh.find_next() key range value X123 60 0.1 X123 70 0.2 X123 80 0.3

  12. Further Reading • http://support.sas.com/resources/papers/proceedings17/0821-2017.pdf • http://www.lexjansen.com/nesug/nesug07/bb/bb16.pdf • https://support.sas.com/resources/papers/sgf2008/better-hashing- sas92.pdf

Recommend


More recommend