testing and documenting your data doesn t have to suck
play

Testing and documenting your data doesnt have to suck Data Council - PowerPoint PPT Presentation

Testing and documenting your data doesnt have to suck Data Council NYC - Nov 2019 @abeGong About me (Abe) Data scientist/engineer Tech-first and enterprise Human-scale, ethical data First time in NYC as an adult (?!)


  1. Testing and documenting your data doesn’t have to suck Data Council NYC - Nov 2019 @abeGong

  2. About me (Abe) Data scientist/engineer ● ● Tech-first and “enterprise” ● Human-scale, ethical data First time in NYC as an adult (?!) ● @abeGong

  3. Outline 1. A thing we do that is ABSOLUTELY CRAZY 2. How to defeat pipeline debt 3. Volunteers wanted! @abeGong

  4. a thing we do that is ABSOLUTELY CRAZY @abeGong

  5. a thing we do that is ABSOLUTELY CRAZY @abeGong

  6. a thing we do that is ABSOLUTELY CRAZY Undocumented @abeGong

  7. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested @abeGong

  8. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested Unstable @abeGong

  9. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested Unstable @abeGong

  10. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested Unstable @abeGong

  11. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested Unstable @abeGong

  12. Trying to maintain a data system that is untested, undocumented and unstable is ABSOLUTELY CRAZY @abeGong

  13. ? @abeGong

  14. a thing we do that is ABSOLUTELY CRAZY Give the monster a name -> Pipeline debtc @abeGong

  15. a thing we do that is ABSOLUTELY CRAZY Give the monster a name The monster’s name is pipeline debt . -> Pipeline debtc @abeGong

  16. Always know what to expect from your data @abeGong

  17. Expectations are assertions about data expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than great_expectations etc. etc. etc. @abeGong

  18. Expectations are assertions about data expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than great_expectations etc. etc. etc. @abeGong

  19. Expectations are assertions about data expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than great_expectations etc. etc. etc. @abeGong

  20. Expectations are assertions about data expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than great_expectations etc. etc. etc. @abeGong

  21. Expectations are assertions about data Expectation Types @abeGong

  22. Expectations are assertions about data Expectation Types Data Sources @abeGong

  23. How to draw an owl 1. Draw some circles 2. Draw the rest of the stupid owl @abeGong

  24. Great Expectations has a bunch of shiny new features @abeGong

  25. Great Expectations has a bunch of shiny new features Validation Renderers Stores Profilers Operators and Views Data Context and Data Asset namespace Expectation Types Data Sources @abeGong

  26. Great Expectations has a bunch of shiny new features @abeGong

  27. Great Expectations has a bunch of shiny new features @abeGong

  28. Great Expectations has a bunch of shiny new features @abeGong

  29. Set up data testing in a day, not a month. @abeGong

  30. Your docs are your tests, and your tests are your docs. @abeGong Icons created by SBTS from Noun Project

  31. Your docs are your tests, and your tests are your docs. @abeGong https://www.locallyoptimistic.com/post/data_dictionaries/

  32. Your docs are your tests, and your tests are your docs. expect_column_values_to_be_between( “Values in this column should be between column=”room_temp”, 60 and 75, at least 95% of the time.” min_value=60, max_value=75, mostly=.95 “Warning: more than 5% of values fell ) outside the specified range of 60 to 75.” @abeGong

  33. Your docs are your tests, and your tests are your docs. @abeGong

  34. Warning: Great Expectations still has rough edges @abeGong

  35. Warning: Great Expectations still has rough edges Validation Renderers Stores Profilers Operators and Views Data Context and Data Asset namespace Expectation Types Data Sources @abeGong

  36. Volunteers wanted! 1. Pick a day 2. Work with us 3. Get set up 4. Improve the project How to get in touch: 👌 https://greatexpectations.io/slack @abeGong

  37. Recap @abeGong

  38. Trying to maintain a data system that is untested, undocumented and unstable is ABSOLUTELY CRAZY @abeGong

  39. a thing we do that is ABSOLUTELY CRAZY Give the monster a name The monster’s name is pipeline debt . -> Pipeline debtc @abeGong

  40. To defeat pipeline debt, always know what to expect of your data. expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than etc. etc. etc. @abeGong

  41. Set up data testing in a day, not a month. @abeGong

  42. Your docs are your tests, and your tests are your docs. @abeGong Icons created by SBTS from Noun Project

  43. Warning: Great Expectations still has rough edges @abeGong

  44. Volunteers wanted! 1. Pick a day 2. Work with us 3. Get set up 4. Improve the project How to get in touch: 👌 https://greatexpectations.io/slack @abeGong

  45. Thank you, New York! https://greatexpectations.io/slack @abeGong

Recommend


More recommend