Getting the Most from Hash Objects Bharath Gowda Getting the most - PowerPoint PPT Presentation

Getting the Most from Hash Objects Bharath Gowda

Getting the most from Hash objects Techniques covered are: • SQL join • Data step merge using BASE engine • Data step merge using SPDE merge • Index Key lookup • Hash object lookup • Simple join, iteration and sorting with hash

Test setup • Only using local SAS datasets. • Datasets (including SPDE) reside in the local work library directory. • Storing data locally nullifies the network delays and I/O delays. • The environment has a super fast solid state hard disk which decreases I/O delay. • Same join logic is implemented over all the techniques.

PROC SQL • Requires no sorting prior to joins Rows in pred_cost 13,160,905 • Multi way joins can be performed. • SQL uses internal utility tables for joins Rows in speciality 16,170,805 Proc sql; Rows in acute_pred create table acute_pred as 12,001,616 (inner join) select a.*,b.nwau_sas from pred_cost as a inner join speciality real time 0:20:28.79 as b user cpu time 0:14:15.31 on a.yearid = b.yearid and a.facility_identifier= b.facility_identifier Memory 254512.00k ; quit;

Data step merge • It requires a sort first by the key Rows in pred_cost 13,160,905 variables in all the input datasets. Proc sort data=pred_cost out= srt_pred_cost; Rows in speciality 16,170,805 by yearid facility_identifier; Run; Rows in acute_pred 12,001,616 Proc sort speciality out= srt_spec (keep= yearid (inner join) facility_identifier nwau_sas); by yearid facility_identifier; real time Run; user cpu time (avg 0:30:28.79 combined sorting and 0:21:15.31 Data acute_pred; merging) Merge srt_pred_cost(in=a) srt_spec(in=b); by yearid facility_identifier; Memory(combined avg) 162393.59k If a and b; Run;

SPDE merge • SPDE stands for Scalable Performance data engine Rows in pred_cost 13,160,905 • SPDE combines software and hardware capabilities. Rows in speciality 16,170,805 libname workspde spde "%sysfunc(pathname(work))" temp=yes; Rows in acute_pred proc copy in=work out= workspde; 12,001,616 (simple join) select pred_cost specialty; run; data workspde.acute_pred; real time 0:16:15.31 merge workspde.Hiepred_acutecost_1718(in=a) user cpu time 0:12:28.79 workspde.nwau16 (in=b keep= nwau_sas yearid facility_identifier); by yearid facility_identifier; Memory 1261040.70k If a and b; run;

Index key lookup • Main dataset is not sorted. Rows in pred_cost 13,160,905 • Multiple set statements with key= option . • _IORC_ is one of the key automatic variable which needs attention. Rows in speciality 16,170,805 Proc datasets lib=work nolist; Rows in acute_pred Modify speciality; 12,001,616 (inner join) Index create myindex=(yearid facility_identifier); run; real time 0:12:15.31 user cpu time 0:06:28.79 Data acute_pred; set pred_cost; Set speciality key=myindex ; Memory 142279.71k If _IORC_ =0 ; run;

HASH Objects • One of the fastest approaches Data acute_pred; for the look up activities. length nwau_sas 8.; • In memory computation makes If _n_=1 then do; this technique the fastest. declare hash hn16(dataset: ‘speciality’,duplicate:’e’); hn16.definekey(‘yearid’,’facility_identifier’); • Dataset size is a key factor for hn16.definedata(‘nwau_sas’); hn16.definedone(); memory consumption. end; • Pre sorting is not required. set pred_cost; • Hash object needs to be defined Rc=hn16.find(key :yearid, key:facility_identifier); and instantiated. If rc=0; • Lookups are performed with the Run; find() function.

HASH Objects • Simple inner joins and left joins can be achieved by keeping a check on Rows in pred_cost 13,160,905 the return code variable. Rows in speciality 16,170,805 Rows in acute_pred if rc=0; 12,001,616 (inner join) real time 0:08:28.79 user cpu time 0:03:15.31 Memory 1465437.28k

HASH Iterators • Hash iterator objects need to be defined and instantiated. • Hash objects are assigned to hash iterator objects. Data top bottom; length nwau_sas 8.; If _n_=1 then do; declare hash hn16(dataset:’speciality’, ordered:’descending’); hn16.definekey(‘yearid’,’facility_identifier’); hn16.definedata(all:’Y’); hn16.definedone(); Declare hiter iter_ex(‘hn16’); End; /*hn16.output(dataset:’ sort_dec ’)*/

HASH Iterators • First() and next() functions to Declare hiter iter_ex(‘hn16’); iterate top to bottom and last() End; and prev() functions to iterate Iter_ex.First(); bottom to top do I = 1 to 10; output top; • Can be used to get the top and Iter_ex.Next(); end; bottom records. Iter_ex.last(); do I = 1 to 10; output bottom; Iter_ex.prev(); end; Run;

Avoid or use? • Where should you apply these techniques? Index Datastep SPDE Hash Hash Type Proc SQL lookup Merge Merge lookup Iterators (key=) Small to medium sized tables (< 10,000 ✓ ✓     rows) ✓  ✓ ✓ ✓ ✓ Huge table (> 10 million rows) ✓  ✓ ✓ ✓ ✓ Unsorted data(> 10 million rows) ✓  ✓    Datasets with multiple indexes  ✓ ✓    Less memory hungry techniques

Getting the most from Hash objects Bharath Gowda, SAS analyst Independent Contractor 0468304568 | bharathg1307@gmail.com

Getting the Most from Hash Objects Bharath Gowda Getting the most - PowerPoint PPT Presentation

Getting the Most from Hash Objects Bharath Gowda Getting the most from Hash objects Techniques covered are: SQL join Data step merge using BASE engine Data step merge using SPDE merge Index Key lookup Hash object lookup

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Mutable Values Announcements Objects (Demo) Objects 4 Objects Objects represent

61A Lecture 12 Announcements Objects (Demo) Objects 4 Objects Objects represent

Generics Asumu Takikawa RacketCon 2012 1 What are generics? 2 What are generics? hash-ref

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

HASH FUNCTIONS 1 / 62 What is a hash function? By a hash function we usually mean a map h : D

Security Proofs for the MD6 Hash Algorithm Ahmed Ezzat Outline Introduction to hash

LUX Hash Function Ivica Nikoli c, Alex Biryukov, Dmitry Khovratovich University of Luxembourg

HASH FUNCTIONS Mihir Bellare UCSD 1 Mihir Bellare UCSD 2 Hash functions Hash functions

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

Hash Functions Hash Functions Lecture 10 Hash Functions Lecture 10 Before we talk about

EBI and NBD Merge to Form Emirates NBD Creation of a UAE Champion Dubai, United Arab Emirates 12

TPG Telecom & Vodafone Hutchison Australia Merger of equals 30 August 2018 1) Transaction

Access Bank Diamond Bank Merger Update Creating Nigeria and Africas Largest Retail Bank

+ Inclusion in the Real World: Perspectives and Strategies for Early Childhood Supporting

NXP Semiconductors Investor Presentation | April 2016 LEGAL NOTICE This document has been

A Growth Story in the Eyewear Industry January 16, 2017 Disclaimer No Offer This

Pershing

2019 Maximum Tax Increases Why is a Tax Increase Needed? o Inflation keep up vs. large catch

Getting the Most from Hash Objects Bharath Gowda Getting the most - PowerPoint PPT Presentation

Getting the Most from Hash Objects Bharath Gowda Getting the most from Hash objects Techniques covered are: SQL join Data step merge using BASE engine Data step merge using SPDE merge Index Key lookup Hash object lookup

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Mutable Values Announcements Objects (Demo) Objects 4 Objects Objects represent

61A Lecture 12 Announcements Objects (Demo) Objects 4 Objects Objects represent

Generics Asumu Takikawa RacketCon 2012 1 What are generics? 2 What are generics? hash-ref

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

HASH FUNCTIONS 1 / 62 What is a hash function? By a hash function we usually mean a map h : D

Security Proofs for the MD6 Hash Algorithm Ahmed Ezzat Outline Introduction to hash

LUX Hash Function Ivica Nikoli c, Alex Biryukov, Dmitry Khovratovich University of Luxembourg

HASH FUNCTIONS Mihir Bellare UCSD 1 Mihir Bellare UCSD 2 Hash functions Hash functions

Topic 22 Hash Tables &quot; hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

Hash Functions Hash Functions Lecture 10 Hash Functions Lecture 10 Before we talk about

EBI and NBD Merge to Form Emirates NBD Creation of a UAE Champion Dubai, United Arab Emirates 12

TPG Telecom &amp; Vodafone Hutchison Australia Merger of equals 30 August 2018 1) Transaction

Access Bank Diamond Bank Merger Update Creating Nigeria and Africas Largest Retail Bank

+ Inclusion in the Real World: Perspectives and Strategies for Early Childhood Supporting

NXP Semiconductors Investor Presentation | April 2016 LEGAL NOTICE This document has been

A Growth Story in the Eyewear Industry January 16, 2017 Disclaimer No Offer This

Pershing

2019 Maximum Tax Increases Why is a Tax Increase Needed? o Inflation keep up vs. large catch

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

TPG Telecom & Vodafone Hutchison Australia Merger of equals 30 August 2018 1) Transaction