FROM AIRFLOW IMPORT DAG Airflow the perfect match in our Analytics Pipeline Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
B E F O R E T H E A C T I O N S TA R T S About - LOVOO is a dating and social app and the place for chatting, live streaming, watching streams and getting to know people. LOVOO - Germany - Dresden & Berlin - 2011 - Acquired by The Meet Group (NASDAQ:MEET) in 2017 - Top 3 Dating App in Europe - + 280 TB of Data - ~ 6 TB Monthly Growth - + 3 TB daily total aggregated data - + 36 TB Swipes (162,824,303,474) Sergio Camilo Fandiño Hernández 3 Senior Business Intelligence Architect @LOVOO
THE TEAM Analytics - Product - Finance - 1 Head - Marketing - 6 Data Analysts - Talent Management - 2 BI Architects - Customer Insights - CRM Sergio Camilo Fandiño Hernández 4 Senior Business Intelligence Architect @LOVOO
WILL IT BE TOO TECHNICAL? WAIT… What can My main purpose today is to tell you about our journey with Airflow as well as a few different use cases that could also boost the work of your Analytics/BI you expect? team on a daily basis. • Pieces of code (examples) • Way too many screenshots Sergio Camilo Fandiño Hernández 5 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
OUR LAST DATE… On-premise Sergio Camilo Fandiño Hernández 7 Senior Business Intelligence Architect @LOVOO
THE COOL KIDS… We went Cloud Sergio Camilo Fandiño Hernández 8 Senior Business Intelligence Architect @LOVOO
THE PROFILE DETAILS… Data Processing Data Loading Airflow Composer Backend Google Kubernetes Google - Firebase Google Sheets EU-Bridge Pub-Sub BigQuery Cloud Storage Payment Providers, Appsumer, Adjust, CRM, etc… Sergio Camilo Fandiño Hernández 9 Senior Business Intelligence Architect @LOVOO
WHAT REALLY MATTERS… Analytics Airflow Composer Data-Core Google - Firebase Google Sheets BigQuery Cloud Storage Payment Providers, Appsumer, Adjust, Redshift, etc… Sergio Camilo Fandiño Hernández 1 0 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
LEFT SWIPING… Orchestration Tool - Identify what is out there - Costs? - Scalability? - Data sources compatibility? - Knowledge/Human Resources? Sergio Camilo Fandiño Hernández 1 2 Senior Business Intelligence Architect @LOVOO
R I G H T S W I P E D … Airflow - Great community - Game changer - Mobile App - Python - BigQuery Sergio Camilo Fandiño Hernández 1 3 Senior Business Intelligence Architect @LOVOO
A GOOD FIT… Google Cloud Composer - Fully Managed Airflow - Scalable - IAP - Secure - Focus on building the Analytics data pipeline - Ease of implementation Sergio Camilo Fandiño Hernández 1 4 Senior Business Intelligence Architect @LOVOO
N O T R I S K , N O F U N … Google Cloud Composer - Fully Managed Airflow - Scalable - IAP - Secure - Focus on building the Analytics data pipeline - Ease of implementation Sergio Camilo Fandiño Hernández 1 5 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
B R E A K I N G T H E I C E … TODO List - SQL Scripts —> Data Modeling - DAGs - Permissions - Service Accounts - Data Importers - Create a Composer Environment - How do we deploy? —> CI/CD Sergio Camilo Fandiño Hernández 1 7 Senior Business Intelligence Architect @LOVOO
GROWING TOGETHER! CI/CD Slack YAML DAGs.py Cloud Build Cloud Composer Trigger Importers SQL Checks Cloud Storage Version Control Passed Sergio Camilo Fandiño Hernández 1 8 Senior Business Intelligence Architect @LOVOO
GROWING TOGETHER! CI/CD Slack YAML DAGs.py Cloud Build Cloud Composer Trigger Importers SQL Cloud Storage Version Control Sergio Camilo Fandiño Hernández 1 9 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
H O W D O E S I T L O O K L I K E ? Operators DAGs • 26 DAGs • Sub-DAGs • Branching • Jinja Templating • Hooks • Pools • Trigger rules Sergio Camilo Fandiño Hernández 2 1 Senior Business Intelligence Architect @LOVOO
P R E T T Y O N T H E O U T S I D E … Analytics - Workflow The Core Sub DAGs Sergio Camilo Fandiño Hernández 2 2 Senior Business Intelligence Architect @LOVOO
P R E T T Y O N T H E I N S I D E … The Core Sub DAG Sergio Camilo Fandiño Hernández 2 3 Senior Business Intelligence Architect @LOVOO
C O M M U N I C AT I O N I S V I TA L … Reports! Slack Webhook Sergio Camilo Fandiño Hernández 2 4 Senior Business Intelligence Architect @LOVOO
C O M M U N I C AT I O N I S V I TA L … Tableau Extracts Sergio Camilo Fandiño Hernández 2 5 Senior Business Intelligence Architect @LOVOO
C O M M U N I C AT I O N I S V I TA L … Is Airflow finished? by the way, this is branching… Sergio Camilo Fandiño Hernández 2 6 Senior Business Intelligence Architect @LOVOO
C O M M U N I C AT I O N I S V I TA L … Is Airflow finished? by the way, this is branching… Sergio Camilo Fandiño Hernández 2 7 Senior Business Intelligence Architect @LOVOO
B E C A U S E S H ! ] H A P P E N S ! Error Alerting Sergio Camilo Fandiño Hernández 2 8 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 2 9 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 3 0 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating this code belongs to the importer.py file Data Sources Sergio Camilo Fandiño Hernández 3 1 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources this pseudo-code belongs to the importer.py file Sergio Camilo Fandiño Hernández 3 2 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources 2 Tables - 2 Days -> ELT in BQ Sergio Camilo Fandiño Hernández 3 3 Senior Business Intelligence Architect @LOVOO
S C H E D U L I N G C U S TO M C O D E Data Importers • Redshift • Firebase (very dynamic) • Google Cloud Storage (Adjust, Merger) • Appsumer, Shopify, Paypal, AppStore, Adyen • S3 Storage Sergio Camilo Fandiño Hernández 3 4 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
Y E S , V E RY D Y N A M I C … Creating Tasks Dynamically Sergio Camilo Fandiño Hernández 3 6 Senior Business Intelligence Architect @LOVOO
Y E S , V E RY D Y N A M I C … Creating Tasks 1. Creating a plain text with meaningful structure Dynamically 2. Create a task based on a PythonOperator 3. Define and write your Callable (your custom code) Sergio Camilo Fandiño Hernández 3 7 Senior Business Intelligence Architect @LOVOO
Y E S , V E RY D Y N A M I C … JSON File Creating Tasks Dynamically Sergio Camilo Fandiño Hernández 3 8 Senior Business Intelligence Architect @LOVOO
Y E S , V E RY D Y N A M I C … Creating Tasks Dynamically this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 3 9 Senior Business Intelligence Architect @LOVOO
Y E S , V E RY D Y N A M I C … Creating Tasks Dynamically this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 4 0 Senior Business Intelligence Architect @LOVOO
Y O U R C O D E G O E S H E R E Creating Tasks this is your custom code (Pseudo-Code) Dynamically Sergio Camilo Fandiño Hernández 4 1 Senior Business Intelligence Architect @LOVOO
Y O U R C O D E G O E S H E R E Creating Tasks this is your custom code (Pseudo-Code) Dynamically Sergio Camilo Fandiño Hernández 4 2 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
D O N ’ T O V E R D O I T Recap and Conclusion Sergio Camilo Fandiño Hernández 4 4 Senior Business Intelligence Architect @LOVOO
IT WAS A MATCH… Recap and Conclusion - Using an Alpha version (Google Composer) in Production was challenging! - Focus on what’s important - Google Cloud Composer - Airflow leverages a bunch of Operators OOTB - Always room for improvement - No magic recipe to use - stay flexible Sergio Camilo Fandiño Hernández 4 5 Senior Business Intelligence Architect @LOVOO
Recommend
More recommend