Co u rse ratings IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp
Ratings at DataCamp INTRODUCTION TO DATA ENGINEERING
Recommend u sing ratings Get rating data Clean and calc u late top - recommended co u rses Recalc u late dail y E x ample u sage : u ser ' s dashboard INTRODUCTION TO DATA ENGINEERING
As an ETL process It ' s an ETL process ! INTRODUCTION TO DATA ENGINEERING
The database Course Rating course_id user_id title course_id description rating programming_language INTRODUCTION TO DATA ENGINEERING
The database relationship Course Rating course_id user_id title course_id description rating programming_language INTRODUCTION TO DATA ENGINEERING
Let ' s practice ! IN TR OD U C TION TO DATA E N G IN E E R IN G
From ratings to recommendations IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp
The recommendations table u ser _ id co u rse _ id rating 1 1 4.8 1 74 4.78 1 21 4.5 2 32 4.9 The estimated rating of a co u rse the u ser hasn ' t taken y et . INTRODUCTION TO DATA ENGINEERING
Recommendation techniq u es Matri x factori z ation B u ilding Recommendation Engines w ith P y Spark INTRODUCTION TO DATA ENGINEERING
Common sense transformation Course Recommendations course_id title u ser _ id co u rse _ id rating description programming_language 1 1 4.8 1 74 4.78 1 21 4.5 Rating user_id 2 32 4.9 course_id rating INTRODUCTION TO DATA ENGINEERING
A v erage co u rse ratings A v erage co u rse rating co u rse _ id a v g _ rating 1 4.8 74 4.78 21 4.5 32 4.9 We w ant to recommend highl y rated co u rses INTRODUCTION TO DATA ENGINEERING
Use the right programming lang u age Rating u ser _ id co u rse _ id programming _ lang u age rating 1 1 r 4.8 1 74 sql 4.78 1 21 sql 4.5 1 32 p y thon 4.9 Recommend SQL co u rse for u ser w ith id 1 INTRODUCTION TO DATA ENGINEERING
Recommend ne w co u rses Rating u ser _ id co u rse _ id programming _ lang u age rating 1 1 r 4.8 1 74 sql 4.78 1 21 sql 4.5 1 32 p y thon 4.9 Don ' t recommend the combinations alread y in the rating table INTRODUCTION TO DATA ENGINEERING
O u r recommendation transformation Use technolog y that u ser has rated most Don ' t recommend co u rses that u ser alread y rated Recommend three highest rated co u rses from remaining combinations INTRODUCTION TO DATA ENGINEERING
Rating u ser _ id co u rse _ id programming _ lang u age rating 1 12 sql 4.78 1 52 sql 4.5 1 32 r 4.9 Recommend three highest rated SQL co u rses w hich are not 12 and 52. INTRODUCTION TO DATA ENGINEERING
Let ' s practice ! IN TR OD U C TION TO DATA E N G IN E E R IN G
Sched u ling dail y jobs IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp
What y o u'v e done so far E x tract u sing extract_course_data() and extract_rating_data() Clean u p u sing NA u sing transform_fill_programming_language() A v erage co u rse ratings per co u rse : transform_avg_rating() Get eligible u ser and co u rse id pairs : transform_courses_to_recommend() Calc u late the recommendations : transform_recommendations() INTRODUCTION TO DATA ENGINEERING
Loading to Postgres Use the calc u lations in data prod u cts Update dail y E x ample u se case : sending o u t e - mails w ith recommendations INTRODUCTION TO DATA ENGINEERING
The loading phase recommendations.to_sql( "recommendations", db_engine, if_exists="append", ) INTRODUCTION TO DATA ENGINEERING
def etl(db_engines): # Extract the data courses = extract_course_data(db_engines) rating = extract_rating_data(db_engines) # Clean up courses data courses = transform_fill_programming_language(courses) # Get the average course ratings avg_course_rating = transform_avg_rating(rating) # Get eligible user and course id pairs courses_to_recommend = transform_courses_to_recommend( rating, courses, ) # Calculate the recommendations recommendations = transform_recommendations( avg_course_rating, courses_to_recommend, ) # Load the recommendations into the database load_to_dwh(recommendations, db_engine)) INTRODUCTION TO DATA ENGINEERING
Creating the DAG from airflow.models import DAG from airflow.operators.python_operator import PythonOperator dag = DAG(dag_id="recommendations", scheduled_interval="0 0 * * *") task_recommendations = PythonOperator( task_id="recommendations_task", python_callable=etl, ) INTRODUCTION TO DATA ENGINEERING
Let ' s practice ! IN TR OD U C TION TO DATA E N G IN E E R IN G
Congrat u lations IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp
Introd u ction to data engineering Identif y the tasks of a data engineer What kind of tools the y u se Clo u d ser v ice pro v iders INTRODUCTION TO DATA ENGINEERING
Data engineering toolbo x Databases Parallel comp u ting & frame w orks ( Spark ) Work � o w sched u ling w ith Air � o w INTRODUCTION TO DATA ENGINEERING
E x tract , Load and Transform ( ETL ) E x tract : get data from se v eral so u rces Transform : perform transformations u sing parallel comp u ting Load : load data into target database INTRODUCTION TO DATA ENGINEERING
Case st u d y: DataCamp Fetch data from m u ltiple so u rces Transform to form recommendations Load into target database INTRODUCTION TO DATA ENGINEERING
Good job ! IN TR OD U C TION TO DATA E N G IN E E R IN G
Recommend
More recommend