managing data quality at scale

Managing Data Quality at Scale The National Health Services - PowerPoint PPT Presentation

Managing Data Quality at Scale The National Health Services Directory Health Data Analytics 2019 Sydney, 16-17 October Allen Nugent Senior Data Analyst Healthdirect Australia 17/10/2019 Classification: Unclassified Classification:

  1. Managing Data Quality at Scale The National Health Services Directory Health Data Analytics 2019 Sydney, 16-17 October Allen Nugent Senior Data Analyst Healthdirect Australia 17/10/2019 Classification: Unclassified Classification: Unclassified

  2. Who is Healthdirect Australia? Trusted health information and advice • national • government-owned • not-for-profit  Patients & Carers  GPs & Specialists  Hospitals & Clinics  Pharmacies  Allied Health Services 17/10/2019 2 Classification: Unclassified Classification: Unclassified

  3. The National Health Services Directory 1 Consumer search Provider search / referral Health service provision 2 gap analysis Policy development, planning, decision making Trusted data sources Geospatial applications Service process Secure electronic messaging, 3 referrals, Access & integration discharge summaries, tooling event summaries, etc. NHSD Use Cases 1 2 3 NHSD Use Cases 17/10/2019 3 Classification: Unclassified Classification: Unclassified

  4. Data Value Augmentation The Goal The Challenge Validated synthesis of rich, high-level inconsistent address entities from multiple, specialised data formats & content sources ambiguous names Secondary Sources Primary Sources abbreviations & Telstra Health misspellings AHPRA RCH Medicare Argus non-standard ABR ontology & coding … HotDoc … omissions 17/10/2019 4 Classification: Unclassified Classification: Unclassified

  5. Data Value Augmentation The Solution Entity resolution / record matching • edit distance metrics • decision trees • clustering algorithms • generative models  machine learning 17/10/2019 5 Classification: Unclassified Classification: Unclassified

  6. Example of Value-Added Data A composite entity for the Find a Service use-case 17/10/2019 6 Classification: Unclassified Classification: Unclassified

  7. Data Quality A practical definition: The extent to which tolerance (1) is consistent with intended use (2) _____________________________________________ (1) tolerance:  analytics / data engineering • maximum permissable error (2) intended end use: • reasonable presumption of inference on the part of the end user  UX design / governance • effectively communicated usage guidelines 17/10/2019 7 Classification: Unclassified Classification: Unclassified

  8. Why is data quality important? User expectations Convenient, Accurate location, Value to reliable discovery opening hours, general public of services service offerings Verification of Accurate and Value to professional Reduced risk of timely data for accreditation & clerical errors & healthcare patient results, Medicare ambiguity infrastructure transfers, referrals processes 17/10/2019 8 Classification: Unclassified Classification: Unclassified

  9. Problem Statement How to develop & implement data quality metrics that … • are meaningful business • drive action performance • serve identified objectives (1) KPIs • are transparent governance • are reproducible guidelines • can be productionised ______________________ (1) defined by stakeholders 17/10/2019 9 Classification: Unclassified Classification: Unclassified

  10. Data Quality Working definition(s): Timeliness : proportion of updates captured within specified time lag Coverage : proportion of known universe represented Consistency : proportion of data that agrees with the relevant Systems of Record Completeness : proportion of records with no missing essential fields Veracity : proportion of data that meets our criteria for conformity, A composite measure of data quality uniqueness, and accuracy  driven by stakeholder feedback 17/10/2019 10 Classification: Unclassified Classification: Unclassified

  11. Technical Challenges Quis custodiet ipsos custodes Quis regunt et duces Who watches the watchers? Who governs the governors? How to implement data quality audits in a manner that does not involve complex code embedded in invisible software modules?  applications  widgets  extracts, reports  FHIR How to create a paradigm for • 1 M documents • streaming data with data quality auditing that is infinite versioning itself governable? • extensible data model 17/10/2019 11 Classification: Unclassified Classification: Unclassified

  12. Definition of an Audit Template An audit should be … • based on a use-case or a data cleaning requirement • composed of logical tests implemented using uncomplicated code • self-describing • self-qualifying confer context • self-normalising mitigate concealed errors • hierarchical 17/10/2019 12 Classification: Unclassified Classification: Unclassified

  13. Output: Audit Table Outcome Context Criteria count of matches count of for filtering, explicit for labeling overall logical test rows grouping statement count needed charts, etc. result name pass rate of rule for perfection 17/10/2019 13 Classification: Unclassified Classification: Unclassified

  14. Code Sample: An Audit Test 17/10/2019 14 Classification: Unclassified Classification: Unclassified

  15. Mid-Level Audit Results 17/10/2019 15 Classification: Unclassified Classification: Unclassified

  16. Mid-Level Audit Results Total: 95,335 organisations 17/10/2019 16 Classification: Unclassified Classification: Unclassified

  17. Conclusions 1. The ultimate success of entity-matching and other data cleaning techiques depends on data quality audits. 2. A highly visible, self-explanatory auditing system can be implemented in-house. 3. This paradigm can scale with the ( Big Data) platform of the data itself. 17/10/2019 17 Classification: Unclassified Classification: Unclassified

  18. Conclusions If we get all this right … Use-Cases: • Find a Practitioner: GP • Discharge Summary: GP • Find a Service: Nurse Triage • Referral to Service Provider 17/10/2019 18 Classification: Unclassified Classification: Unclassified

  19. … we won’t have to worry about this: 17/10/2019 19 Classification: Unclassified Classification: Unclassified

  20. Questions 17/10/2019 Classification: Unclassified Classification: Unclassified


More recommend