data integrity verification
play

Data Integrity Verification IIA Orange County Chapter November 13, - PowerPoint PPT Presentation

Data Integrity Verification IIA Orange County Chapter November 13, 2015 1 Data Integrity Verification Excel Transformed My Data! BEFORE AFTER


  1. Data Integrity Verification IIA Orange County Chapter November 13, 2015 �������������� ���� 1 Data Integrity Verification

  2. Excel Transformed My Data! BEFORE AFTER ��������������������� �������������� 2 Data Integrity Verification

  3. Michael Kano Senior Manager, Data Analytics Sunera LLC Michael is a Senior Manager with Sunera’s national data analytics practice. Michael has 20 years of experience in data analytics and internal audit with organizations in the USA, Canada, and Kuwait. He has 20 years of experience with ACL software, including 8 years as the leader of ACL Services Ltd.’s global training team. During his tenure at ACL Services, Michael helped drive the training business to new levels of revenues and profits by actively supporting the Sales team in pre-sales discussions. Michael’s most recent experience consists of four years with eBay, Inc.’s internal audit team as Manager, Audit Analysis. He was tasked with integrating data analytics into the audit workflow on strategic and tactical levels. This included developing quality and documentation standards, training users, and providing analytics support on numerous audits in the IT, PayPal, and eBay marketplaces business areas. He also provided support to non-IA teams such as the Business Ethics Office and Enterprise Risk Management teams. During his years at eBay, Michael supported audits throughout the organization in the IT, compliance, operations, vendor management, revenue assurance, T&E, and human resources areas. Michael also has 7 years of experience with Arbutus Software, and has managed the transition to Arbutus from other data analysis tools. He is a proficient user of Tableau, Microsoft Access, and Teradata SQL Assistant. 3

  4. AGENDA � Defining data integrity verification (DIV) � Sources of integrity erosion � File-level testing � Field-level testing 4 Data Integrity Verification

  5. Defining Data Integrity Verification 5 Data Integrity Verification

  6. Data Integrity Verification (DIV) � The process by which the data analyst tests the data to determine whether it is acceptable for analysis � Tests should be carried out at both the file level and the field level before conducting any analytics. 6 Data Integrity Verification

  7. The Risks of Integrity Erosion � Lost time � Incorrect conclusions � Revenue/cost � Security � Professional standing 7 Data Integrity Verification

  8. Evidence of data integrity erosion � Blank/invalid entries � Missing records in key fields � Excess records � Incorrect/invalid � Duplicates formatting � Shifted fields � Invalid characters in � Skewed records data 8 Data Integrity Verification

  9. Shifted Fields 9 Data Integrity Verification

  10. Skewed Records 10 Data Integrity Verification

  11. Sources of Integrity Erosion 11 Data Integrity Verification

  12. Processing… 12 Data Integrity Verification

  13. The Process 13 Data Integrity Verification

  14. Sources of data integrity errors � Miscommunication of requirements � Extraction � Conversion � Transmission � Import � Manual edits � Data definition 14 Data Integrity Verification

  15. Miscommunication � "All AP transactions between April and June, including all important fields." � "All AP payments and reversals between 4/1/2015 and 6/30/2015 (inclusive) including the following fields: <field list>. The output should be in a tab-delimited text file, and at no point should it pass through a spreadsheet or be opened in a spreadsheet application." 15 Data Integrity Verification

  16. Conversion � Dropping leading zeros (ID numbers) � Converting date to numeric � Removing alphas from alphanumeric field � Use of delimiter that is included within a text field � Insertion of blank lines in Excel 16 Data Integrity Verification

  17. Date Conversion 17 Data Integrity Verification

  18. Manual Edits � Inadvertent/deliberate editing � How does that happen? � Sorting � Formatting � Copy/pasting 18 Data Integrity Verification

  19. Data Definition � Record length � Field position � Formatting (date fields) 19 Data Integrity Verification

  20. File-Level Testing 20 Data Integrity Verification

  21. File-Level Testing � Structure � Content 21 Data Integrity Verification

  22. Structure � Review metadata � Send table layout to a table in Arbutus/ACL � Compare field type/length/format to metadata 22 Data Integrity Verification

  23. Content � Completeness � Run COUNT to document number of records � Run TOTAL on numeric fields for control totals � Uniqueness: Run DUPLICATES command selecting all fields to identify duplicate records � Validity: Run VERIFY against numeric and date fields 23 Data Integrity Verification

  24. Field-Level Testing: Numerics 24 Data Integrity Verification

  25. Numeric Fields: What to look for � Field total � Lowest value � Highest value � Average � Second-highest value � Range � Ratio of 2nd highest to highest � Absolute value � Median � Number of zeros � Number of positives � Number of negatives � Number of corrupt entries 25 Data Integrity Verification

  26. Testing Numeric Fields � Run STATISTICS against all numeric fields � Look for zeros, negatives, bounds, highest/second-highest � Recalculate computed value with computed fields (e.g, Total_Amount = Price * Quantity) 26 Data Integrity Verification

  27. Scripted Solution � Shows table/field names, and test date-time in a table � Provides comprehensive, standard test results � Faster and less error-prone than manual execution � 2 million records, 4 numeric fields in ~45 seconds � Also saves table layout for file with _TL suffix 27 Data Integrity Verification

  28. Script Results: Numerics 28 Data Integrity Verification

  29. Field-Level Testing: Dates 29 Data Integrity Verification

  30. Date Fields: What to look for � Oldest � Weekends � Most recent � Blanks � Span of valid dates � Invalid non-blank dates 30 Data Integrity Verification

  31. Testing Date Fields � Run STATISTICS against all date fields � Blanks/invalids/weekends � Bounds � Test related fields, e.g., PO_Date <= Invoice_Date � Test for completeness (24/7 data) with GAPS command 31 Data Integrity Verification

  32. Blank Dates & Formatting � Entire date column is blank = Incorrect format in field definition. � Edit >> Table Layout to review and correct format 32 Data Integrity Verification

  33. Formatting Date Fields 33 Data Integrity Verification

  34. Dates: Scripted Solution Data Integrity Verification 34

  35. Field-Level Testing: Characters 35 Data Integrity Verification

  36. Character Fields: What to look for Item Functionality Blanks ISBLANK(<key>) Invalid entries CLASSIFY ON <key> CLASSIFY ON FORMAT(<key>) Duplicates DUPLICATES ON <key> 36 Data Integrity Verification

  37. Character Fields: Formats � Verify that format is valid � May need to scrub � PO numbers, customer IDs, phone numbers, zip codes � Use FORMAT() function in CLASSIFY to display list of unique formats CLASSIFY ON FORMAT(<field name>) TO "<output file>" OPEN 37 Data Integrity Verification

  38. Output of CLASSIFY + Format() � 1 record per format � Shows frequency x= lower-case alpha X = upper-case alpha 9 = numeric Blanks/special characters 38 Data Integrity Verification

  39. Mitigating Integrity Risk 39 Data Integrity Verification

  40. Key Items � Know your data � Obtain data independently (SQL?) � Short chain from extraction to analysis � Automated DIV 40 Data Integrity Verification

  41. The Process 41 Data Integrity Verification

  42. The New Process 42 Data Integrity Verification

  43. Benefits � Independence � Confidence � Shorter time � Comprehensive DIV 43 Data Integrity Verification

  44. Any questions? Michael Kano, ACDA mkano@sunera.com 44 Data Integrity Verification

Recommend


More recommend