nick schrock
play

Nick Schrock Founder, Elementl @schrockn Our data is totally - PowerPoint PPT Presentation

Nick Schrock Founder, Elementl @schrockn Our data is totally broken Our data is totally broken We dont know where our data comes from We dont know what it means We cannot reliably process and test it Our


  1. Nick Schrock Founder, Elementl @schrockn

  2. “Our data is totally broken”

  3. “Our data is totally broken” ‣ We don’t know where our data comes from ‣ We don’t know what it means ‣ We cannot reliably process and test it ‣ Our engineers don’t want to deal with it ‣ It isn’t “fun.” It isn’t “sexy.”

  4. WHAT THEY SAY My Job Data Cleaning

  5. WHAT THEY MEAN My Job Data Cleaning Not my job

  6. WHAT THEY MEAN My Job Data Cleaning Not my job

  7. WHAT THEY MEAN My Job ‣ Rolling their own infrastructure ‣ Repeated work Data Cleaning ‣ Maintaining unreliable processes Not my job

  8. FAILURE IS THE NORM

  9. Engineers: I don’t want to touch it. Data scientist: I waste most of my time. Business Leader: Failure is the norm.

  10. 2009: UI development is awful ‣ I spend 80% of my time fighting the browser

  11. 2009: UI development is awful ‣ I spend 80% of my time fighting the browser ‣ We can’t change our UI–there’s no testing ‣ It breaks all the time. ‣ Our engineers don’t want to touch it

  12. 2019: A (UI) world transformed Browsers did get better. But it was the software abstractions that proved decisive.

  13. Scripts Full applications React acknowledged complexity It respected the discipline

  14. React Frontend Applications Dagster Data Applications

  15. PRINCIPLEs ‣ Solves a real problem ‣ Incremental adoption path ‣ Preserve tools that work ‣ Immediate value and productivity gains

  16. Data Applications Graphs of functional computations that produce and consume data assets

  17. > pip install dagster

  18. DAGSTER CONCEPTS ‣ Solid: A unit of functional computation ‣ Pipeline: A DAG of solids

  19. on Page Rank

  20. DAGSTER CONCEPTS ‣ Solid ‣ Inputs: Inputs are the data ‣ Config: Config modifies how data is computed ‣ Pipeline

  21. DAGSTER CONCEPTS ‣ Solid ‣ Inputs & Config ‣ Pipeline ‣ Dependencies

  22. Before: After:

  23. DAGSTER CONCEPTS ‣ Solid ‣ Inputs & Config ‣ Pipeline ‣ Dependencies ‣ Context ‣ Logging: Structured Logging ‣ Resources: Connections, Services, Etc

  24. Beautiful, High-Quality Tools Dagit Editor DAG View Console API Python library Dagster Libraries and Integrations PySpark

  25. Graph of Functional Computations ‣ Queryable and Introspectable ‣ Operable API ‣ Executable and Configurable ‣ Monitorable ‣ Logging and Live Subscriptions Dagster: a platform for building tools

  26. Beautiful, High-Quality Tools Dagit Editor DAG View Console API Python library Dagster Libraries and Integrations PySpark

  27. Beautiful, High-Quality Tools Dagit Editor DAG View Console API Python library Dagster Libraries and Integrations PySpark Scala SQL Spark Runtime DBs (Snowflake et

  28. ‣ Open Source, Python Library ‣ Multi-lingual integration ‣ Beautiful Tooling

  29. What ABOUT THOSE DATA SCIENTISTS? Current Status Quo Where we need to go Engineering Data Engineering Data Overlap is cultural, driven by

  30. Beautiful, High-Quality Tools Dagit Editor DAG View Console API Python library Dagster Libraries and Integrations PySpark Scala SQL Spark Runtime DBs (Snowflake et

  31. Beautiful, High-Quality Tools Dagit Editor DAG View Console API Python library Dagster Libraries and Integrations PySpark Scala SQL Papermill Spark Runtime DBs (Snowflake et Jupyter

  32. Python library Dagster Libraries and Integrations PySpark Scala SQL Papermill Spark Runtime DBs (Snowflake et Jupyter Data Engineering Analysts Data Science

  33. Data Application: A Graph of Computations Local Executor dagit dagster cli API Airflow Dagster Libraries and Integrations dagster- Data Engineering Analysts Data Science

  34. DATA ENGINEERING • An emerging discipline • At an inflection point Scripts Data Applications

  35. ELEGANT PROGRAMMING MODEL NEW, BEAUTIFUL TOOLING FLEXIBLE AND INCREmENTAL

  36. FLEXIBLE AND INCREmENTAL ‣ Use your tools ‣ Preserve your code ‣ Deploy to your infrastructure ‣ Adopt incrementally

  37. And there is a ton of work to do

  38. TEAM Max Gasner Nate Kupp Alex Langenfeld

  39. THANK YOU Ben Gotow Mikhail Novikov Uma Roy

  40. THANK YOU Abe Gong Superconductive Health

  41. https://github.com/dagster-io/dagster Join the team. Partner with us. https://elementl.com schrockn@elementl.com @schrockn

Recommend


More recommend