airflow the perfect match in our analytics pipeline
play

Airflow the perfect match in our Analytics Pipeline Sergio Camilo - PowerPoint PPT Presentation

FROM AIRFLOW IMPORT DAG Airflow the perfect match in our Analytics Pipeline Sergio Camilo Fandio Hernndez Senior Business Intelligence Architect @LOVOO A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is


  1. FROM AIRFLOW IMPORT DAG Airflow the perfect match in our Analytics Pipeline Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO

  2. A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion

  3. B E F O R E T H E A C T I O N S TA R T S About - LOVOO is a dating and social app and the place for chatting, live streaming, watching streams and getting to know people. LOVOO - Germany - Dresden & Berlin - 2011 - Acquired by The Meet Group (NASDAQ:MEET) in 2017 - Top 3 Dating App in Europe - + 280 TB of Data - ~ 6 TB Monthly Growth - + 3 TB daily total aggregated data - + 36 TB Swipes (162,824,303,474) Sergio Camilo Fandiño Hernández 3 Senior Business Intelligence Architect @LOVOO

  4. THE TEAM Analytics - Product - Finance - 1 Head - Marketing - 6 Data Analysts - Talent Management - 2 BI Architects 
 - Customer Insights - CRM 
 Sergio Camilo Fandiño Hernández 4 Senior Business Intelligence Architect @LOVOO

  5. WILL IT BE TOO TECHNICAL? WAIT… What can My main purpose today is to tell you about our journey with Airflow as well as a few different use cases that could also boost the work of your Analytics/BI you expect? team on a daily basis. • Pieces of code (examples) • Way too many screenshots Sergio Camilo Fandiño Hernández 5 Senior Business Intelligence Architect @LOVOO

  6. A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion

  7. OUR LAST DATE… On-premise Sergio Camilo Fandiño Hernández 7 Senior Business Intelligence Architect @LOVOO

  8. THE COOL KIDS… We went Cloud Sergio Camilo Fandiño Hernández 8 Senior Business Intelligence Architect @LOVOO

  9. THE PROFILE DETAILS… Data Processing Data Loading Airflow Composer Backend Google Kubernetes Google - Firebase Google Sheets EU-Bridge Pub-Sub BigQuery Cloud Storage Payment Providers, Appsumer, Adjust, CRM, etc… Sergio Camilo Fandiño Hernández 9 Senior Business Intelligence Architect @LOVOO

  10. WHAT REALLY MATTERS… Analytics Airflow Composer Data-Core Google - Firebase Google Sheets BigQuery Cloud Storage Payment Providers, Appsumer, Adjust, Redshift, etc… Sergio Camilo Fandiño Hernández 1 0 Senior Business Intelligence Architect @LOVOO

  11. A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion

  12. LEFT SWIPING… Orchestration Tool - Identify what is out there - Costs? - Scalability? - Data sources compatibility? - Knowledge/Human Resources? Sergio Camilo Fandiño Hernández 1 2 Senior Business Intelligence Architect @LOVOO

  13. R I G H T S W I P E D … Airflow - Great community - Game changer - Mobile App - Python - BigQuery Sergio Camilo Fandiño Hernández 1 3 Senior Business Intelligence Architect @LOVOO

  14. A GOOD FIT… Google Cloud Composer - Fully Managed Airflow - Scalable - IAP - Secure - Focus on building the Analytics data pipeline - Ease of implementation Sergio Camilo Fandiño Hernández 1 4 Senior Business Intelligence Architect @LOVOO

  15. N O T R I S K , N O F U N … Google Cloud Composer - Fully Managed Airflow - Scalable - IAP - Secure - Focus on building the Analytics data pipeline - Ease of implementation Sergio Camilo Fandiño Hernández 1 5 Senior Business Intelligence Architect @LOVOO

  16. A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion

  17. B R E A K I N G T H E I C E … TODO List - SQL Scripts —> Data Modeling - DAGs - Permissions - Service Accounts - Data Importers - Create a Composer Environment - How do we deploy? —> CI/CD Sergio Camilo Fandiño Hernández 1 7 Senior Business Intelligence Architect @LOVOO

  18. GROWING TOGETHER! CI/CD Slack YAML DAGs.py Cloud Build Cloud Composer Trigger Importers SQL Checks Cloud Storage Version Control Passed Sergio Camilo Fandiño Hernández 1 8 Senior Business Intelligence Architect @LOVOO

  19. GROWING TOGETHER! CI/CD Slack YAML DAGs.py Cloud Build Cloud Composer Trigger Importers SQL Cloud Storage Version Control Sergio Camilo Fandiño Hernández 1 9 Senior Business Intelligence Architect @LOVOO

  20. A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion

  21. H O W D O E S I T L O O K L I K E ? Operators DAGs • 26 DAGs • Sub-DAGs • Branching • Jinja Templating • Hooks • Pools • Trigger rules Sergio Camilo Fandiño Hernández 2 1 Senior Business Intelligence Architect @LOVOO

  22. P R E T T Y O N T H E O U T S I D E … Analytics - Workflow The Core Sub DAGs Sergio Camilo Fandiño Hernández 2 2 Senior Business Intelligence Architect @LOVOO

  23. P R E T T Y O N T H E I N S I D E … The Core Sub DAG Sergio Camilo Fandiño Hernández 2 3 Senior Business Intelligence Architect @LOVOO

  24. C O M M U N I C AT I O N I S V I TA L … Reports! Slack Webhook Sergio Camilo Fandiño Hernández 2 4 Senior Business Intelligence Architect @LOVOO

  25. C O M M U N I C AT I O N I S V I TA L … Tableau Extracts Sergio Camilo Fandiño Hernández 2 5 Senior Business Intelligence Architect @LOVOO

  26. C O M M U N I C AT I O N I S V I TA L … Is Airflow finished? by the way, this is branching… Sergio Camilo Fandiño Hernández 2 6 Senior Business Intelligence Architect @LOVOO

  27. C O M M U N I C AT I O N I S V I TA L … Is Airflow finished? by the way, this is branching… Sergio Camilo Fandiño Hernández 2 7 Senior Business Intelligence Architect @LOVOO

  28. B E C A U S E S H ! ] H A P P E N S ! Error Alerting Sergio Camilo Fandiño Hernández 2 8 Senior Business Intelligence Architect @LOVOO

  29. B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 2 9 Senior Business Intelligence Architect @LOVOO

  30. B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 3 0 Senior Business Intelligence Architect @LOVOO

  31. B E I N G F L E X I B L E I S A B I G F L E X ! Integrating this code belongs to the importer.py file Data Sources Sergio Camilo Fandiño Hernández 3 1 Senior Business Intelligence Architect @LOVOO

  32. B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources this pseudo-code belongs to the importer.py file Sergio Camilo Fandiño Hernández 3 2 Senior Business Intelligence Architect @LOVOO

  33. B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources 2 Tables - 2 Days -> ELT in BQ Sergio Camilo Fandiño Hernández 3 3 Senior Business Intelligence Architect @LOVOO

  34. S C H E D U L I N G C U S TO M C O D E Data Importers • Redshift • Firebase (very dynamic) • Google Cloud Storage (Adjust, Merger) • Appsumer, Shopify, Paypal, AppStore, Adyen • S3 Storage Sergio Camilo Fandiño Hernández 3 4 Senior Business Intelligence Architect @LOVOO

  35. A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion

  36. Y E S , V E RY D Y N A M I C … Creating Tasks Dynamically Sergio Camilo Fandiño Hernández 3 6 Senior Business Intelligence Architect @LOVOO

  37. Y E S , V E RY D Y N A M I C … Creating Tasks 1. Creating a plain text with meaningful structure Dynamically 2. Create a task based on a PythonOperator 3. Define and write your Callable (your custom code) Sergio Camilo Fandiño Hernández 3 7 Senior Business Intelligence Architect @LOVOO

  38. Y E S , V E RY D Y N A M I C … JSON File Creating Tasks Dynamically Sergio Camilo Fandiño Hernández 3 8 Senior Business Intelligence Architect @LOVOO

  39. Y E S , V E RY D Y N A M I C … Creating Tasks Dynamically this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 3 9 Senior Business Intelligence Architect @LOVOO

  40. Y E S , V E RY D Y N A M I C … Creating Tasks Dynamically this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 4 0 Senior Business Intelligence Architect @LOVOO

  41. Y O U R C O D E G O E S H E R E Creating Tasks this is your custom code (Pseudo-Code) Dynamically Sergio Camilo Fandiño Hernández 4 1 Senior Business Intelligence Architect @LOVOO

  42. Y O U R C O D E G O E S H E R E Creating Tasks this is your custom code (Pseudo-Code) Dynamically Sergio Camilo Fandiño Hernández 4 2 Senior Business Intelligence Architect @LOVOO

  43. A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion

  44. D O N ’ T O V E R D O I T Recap and Conclusion Sergio Camilo Fandiño Hernández 4 4 Senior Business Intelligence Architect @LOVOO

  45. IT WAS A MATCH… Recap and Conclusion - Using an Alpha version (Google Composer) in Production was challenging! - Focus on what’s important - Google Cloud Composer - Airflow leverages a bunch of Operators OOTB - Always room for improvement - No magic recipe to use - stay flexible Sergio Camilo Fandiño Hernández 4 5 Senior Business Intelligence Architect @LOVOO

Recommend


More recommend