Drupal as a Data Warehouse Everybody Into the Data Lake! Gail Radecki, CHCP, American Academy of Allergy Asthma & Immunology Ezra Wolfe, EthosCE Devin Zuczek, EthosCE
“ “ Above all else, show the data Edward Tufte
My name is Gail… ...and I’m one of “those” customers. ● Pushy ● A know-just-enough ● Annoying
The World of Continuing Medical Education ● Accreditation/Compliance ● Needs Assessment ● Grants Reconciliation
Data reporting...stayed the same
This is where Community comes in handy ● EthosCE User Group ● Support tickets ● Online Community
The Importance of Outcomes ● Medical Specialty Society ● Needs assessment ● Accreditation
We need a solution!
My name is Ezra. I am the product manager. This is a me with a bad haircut.
The Product: 65+ hospitals, health systems, associations ● Almost 1 million learner accounts ● 7.9 million course enrollments ●
“ All happy families customers are alike; “ each unhappy family customer is unhappy about their reports. Leo Tolstoy
Existing process 1. Customer requests a new report 2. Requirements 3. Development feasibility 4. Back and forth with customer 5. Make the view, put it in code 6. Change management, code review, documentation, tests 7. Custom report is deployed! We are done! 8. Repeat steps #1-7 again because "that" customer forgot something 9. Repeat steps #1-7 again later because of a product schema change
My name is Devin and I am a systems architect. Throwback to PHP 4? This is a duck.
Big data!!! Phase 1 Phase 2 Phase 3 Big data ? Profit!
Making a plan ● Previous solutions involved a suite of modules: Views, Homebox, Charts + HighCharts, Views Data Export. ● Building our own tool would have been a distraction from our core business — we are not data scientists. ● We needed a tool that we could give to customers to report on their own data instead of us doing it.
Vendor selection ● Vendors don’t know about... ○ Drupal ○ php serialization ○ webform ● It’s your job to do your due diligence and ensure you select the correct system
BYODW? (Bring your own data warehouse?)
Did I mention where I work?? ● Non-profit ● Tech maintenance vs. adding features ● We have to justify EVERYTHING
First steps ● Could we point a tool at Drupal and have it report out of the box? ● Do we need a data warehouse? To answer the question of why reporting on Drupal data in its native form is not optimal, we have to look at how the data is stored.
DBA 101: Tables people user_id name employer_name 1 Barry Cuda U. Corp. 2 Abby Normal Gekko & Co. 3 Rita Book U. Corp. 4 Ray O’Sun U. Corp. U. Corp. changed their name. Making this change requires...
DBA 101: Tables people user_id name employer_name 1 Barry Cuda Initech 2 Abby Normal Gekko & Co 3 Rita Book Initech 4 Ray O’Sun Initech U. Corp. changed their name. Making this change requires... 3 row updates. Not good at scale, as this will lock those rows for editing!
Normalizing People uid name employer_name 1 Barry Cuda Initech Eliminate columns with duplicate data 2 Abby Normal Gekko & Co by creating 3 Rita Book Initech separate tables, and identify that data People Employer relation Employer data with a key. uid name eid eid name uid eid 1 Barry X Move data that is 5 Initech 1 5 Cuda not relevant to the 2 Abby X 6 Gekko & Co. 2 6 primary key. Normal 3 Rita X 3 5 Book
DBA 201: Normalizing How do we efficiently store a user, full name, location, and employer? User Profile location Location Profile user_id user_name profile_id loc_id profile_id user_id loc_id province country 1 john 2 3 2 1 3 PA US 2 jane 3 4 3 2 4 NJ US Profile name No data is Profile employer Employer unnecessarily profile_id fullname profile_id emp_id employer_id name dependent. ...that includes the database! 2 John Smith 6 A, LLC. 2 6 An update only requires 1 write. 7 D Corp 3 Jane Doe 3 7
The problem Operational database Reporting database ● Fast writes ● Fast, easy reads ● Holds data ● Holds information ● Relational model not ● Relational model more informative informative to users ● Built for integrity — E.F. Codd, "Further Normalization of the Data Base Relational Model
Another module
Denormalizer module This module pulls from Drupal 1. Denormalized table name defined schema to 2. Entity/table to denormalize programmatically build 3. The primary key to use denormalized tables or views for 4. The changed key to use use in data warehousing flows. Example: A node of type “chinchilla” with a rate field. We could also request the raw vote table.
Recommend
More recommend