drupal as a data warehouse
play

Drupal as a Data Warehouse Everybody Into the Data Lake! Gail - PowerPoint PPT Presentation

Drupal as a Data Warehouse Everybody Into the Data Lake! Gail Radecki, CHCP, American Academy of Allergy Asthma & Immunology Ezra Wolfe, EthosCE Devin Zuczek, EthosCE Above all else, show the data Edward Tufte My name is Gail


  1. Drupal as a Data Warehouse Everybody Into the Data Lake! Gail Radecki, CHCP, American Academy of Allergy Asthma & Immunology Ezra Wolfe, EthosCE Devin Zuczek, EthosCE

  2. “ “ Above all else, show the data Edward Tufte

  3. My name is Gail… ...and I’m one of “those” customers. ● Pushy ● A know-just-enough ● Annoying

  4. The World of Continuing Medical Education ● Accreditation/Compliance ● Needs Assessment ● Grants Reconciliation

  5. Data reporting...stayed the same

  6. This is where Community comes in handy ● EthosCE User Group ● Support tickets ● Online Community

  7. The Importance of Outcomes ● Medical Specialty Society ● Needs assessment ● Accreditation

  8. We need a solution!

  9. My name is Ezra. I am the product manager. This is a me with a bad haircut.

  10. The Product: 65+ hospitals, health systems, associations ● Almost 1 million learner accounts ● 7.9 million course enrollments ●

  11. “ All happy families customers are alike; “ each unhappy family customer is unhappy about their reports. Leo Tolstoy

  12. Existing process 1. Customer requests a new report 2. Requirements 3. Development feasibility 4. Back and forth with customer 5. Make the view, put it in code 6. Change management, code review, documentation, tests 7. Custom report is deployed! We are done! 8. Repeat steps #1-7 again because "that" customer forgot something 9. Repeat steps #1-7 again later because of a product schema change

  13. My name is Devin and I am a systems architect. Throwback to PHP 4? This is a duck.

  14. Big data!!! Phase 1 Phase 2 Phase 3 Big data ? Profit!

  15. Making a plan ● Previous solutions involved a suite of modules: Views, Homebox, Charts + HighCharts, Views Data Export. ● Building our own tool would have been a distraction from our core business — we are not data scientists. ● We needed a tool that we could give to customers to report on their own data instead of us doing it.

  16. Vendor selection ● Vendors don’t know about... ○ Drupal ○ php serialization ○ webform ● It’s your job to do your due diligence and ensure you select the correct system

  17. BYODW? (Bring your own data warehouse?)

  18. Did I mention where I work?? ● Non-profit ● Tech maintenance vs. adding features ● We have to justify EVERYTHING

  19. First steps ● Could we point a tool at Drupal and have it report out of the box? ● Do we need a data warehouse? To answer the question of why reporting on Drupal data in its native form is not optimal, we have to look at how the data is stored.

  20. DBA 101: Tables people user_id name employer_name 1 Barry Cuda U. Corp. 2 Abby Normal Gekko & Co. 3 Rita Book U. Corp. 4 Ray O’Sun U. Corp. U. Corp. changed their name. Making this change requires...

  21. DBA 101: Tables people user_id name employer_name 1 Barry Cuda Initech 2 Abby Normal Gekko & Co 3 Rita Book Initech 4 Ray O’Sun Initech U. Corp. changed their name. Making this change requires... 3 row updates. Not good at scale, as this will lock those rows for editing!

  22. Normalizing People uid name employer_name 1 Barry Cuda Initech Eliminate columns with duplicate data 2 Abby Normal Gekko & Co by creating 3 Rita Book Initech separate tables, and identify that data People Employer relation Employer data with a key. uid name eid eid name uid eid 1 Barry X Move data that is 5 Initech 1 5 Cuda not relevant to the 2 Abby X 6 Gekko & Co. 2 6 primary key. Normal 3 Rita X 3 5 Book

  23. DBA 201: Normalizing How do we efficiently store a user, full name, location, and employer? User Profile location Location Profile user_id user_name profile_id loc_id profile_id user_id loc_id province country 1 john 2 3 2 1 3 PA US 2 jane 3 4 3 2 4 NJ US Profile name No data is Profile employer Employer unnecessarily profile_id fullname profile_id emp_id employer_id name dependent. ...that includes the database! 2 John Smith 6 A, LLC. 2 6 An update only requires 1 write. 7 D Corp 3 Jane Doe 3 7

  24. The problem Operational database Reporting database ● Fast writes ● Fast, easy reads ● Holds data ● Holds information ● Relational model not ● Relational model more informative informative to users ● Built for integrity — E.F. Codd, "Further Normalization of the Data Base Relational Model

  25. Another module

  26. Denormalizer module This module pulls from Drupal 1. Denormalized table name defined schema to 2. Entity/table to denormalize programmatically build 3. The primary key to use denormalized tables or views for 4. The changed key to use use in data warehousing flows. Example: A node of type “chinchilla” with a rate field. We could also request the raw vote table.

Recommend


More recommend