Data Management Hints & Tips By Clark Lawson, Nationwide Building Society @thesasgeek
Agenda • Introduction • Data Management • Data Quality in SAS
Introduction • SAS data warehouse manager at Nationwide Building Society. • Data Warehouse supplies data to over 200 SAS users. • A SAS user since 2005 • Recent focus is migrating from SAS Base to SAS Data Integration Studio
Data Management • With everyone moving into the digital age, data is recognised as a vital enterprise asset. • Having data management principles embedded into what we do either as Data Scientists or Analysts, will help make more informed and effective decisions. • This means that our role needs to include some form of minimum standards of standards, governance and control. • This enables the business analyst to focus on insight rather than doing data management at the start of their project. • What can these look like...
Data Management Standards Data Management Master Data Data Data Quality Data Modelling Management Governance Information Management Reconciliation SAS Standards Process Models and Controls
Data Management Applied in SAS Data Quality SAS Standards Across Data Domains • Consistent Variable Names Data Quality • Consistent Variable Values • Identify key fields and track data • Consistent Formats quality SAS Processing Data Validation • Uniformed processing • Check data against interface • Using SAS Data Integration Studio contracts • Continually evaluate at every step and its built in transformations Documentation Business Validation Rules (BVRs) • Interface contracts both input & • Defined data rules constantly output checked • Service Level agreements
Data Quality in SAS • Here we will discuss how to apply data quality standards in SAS. • At this point assume that we have… Interface BVRs Contracts
Data Quality Starting Point Component Description Name The name of the variable in the dataset Description The agreed definition of the relevant variable Metadata Information of the type, length & format of the variable Nullable Are missing values / nulls allowed? Acceptable Values A list of agreed acceptable values for both continuous and discrete variables
Data Quality Steps in SAS Data Validation Data Profiling Continuous Variables Key Variable Integrity Check range & outliers using PROC MEANS & GCHART Validate schemas using primary & foreign keys using PROC DATASETS. Discrete Variables Check values using PROC FREQ Business Validation Rules Utilise lookup tables and call execute to loop around rules Duplicate Values Check using SQL Check for duplicate values using PROC SQL
www.SAS.com
Recommend
More recommend