co u rse ratings
play

Co u rse ratings IN TR OD U C TION TO DATA E N G IN E E R IN G - PowerPoint PPT Presentation

Co u rse ratings IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp Ratings at DataCamp INTRODUCTION TO DATA ENGINEERING Recommend u sing ratings Get rating data Clean and calc u late top -


  1. Co u rse ratings IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp

  2. Ratings at DataCamp INTRODUCTION TO DATA ENGINEERING

  3. Recommend u sing ratings Get rating data Clean and calc u late top - recommended co u rses Recalc u late dail y E x ample u sage : u ser ' s dashboard INTRODUCTION TO DATA ENGINEERING

  4. As an ETL process It ' s an ETL process ! INTRODUCTION TO DATA ENGINEERING

  5. The database Course Rating course_id user_id title course_id description rating programming_language INTRODUCTION TO DATA ENGINEERING

  6. The database relationship Course Rating course_id user_id title course_id description rating programming_language INTRODUCTION TO DATA ENGINEERING

  7. Let ' s practice ! IN TR OD U C TION TO DATA E N G IN E E R IN G

  8. From ratings to recommendations IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp

  9. The recommendations table u ser _ id co u rse _ id rating 1 1 4.8 1 74 4.78 1 21 4.5 2 32 4.9 The estimated rating of a co u rse the u ser hasn ' t taken y et . INTRODUCTION TO DATA ENGINEERING

  10. Recommendation techniq u es Matri x factori z ation B u ilding Recommendation Engines w ith P y Spark INTRODUCTION TO DATA ENGINEERING

  11. Common sense transformation Course Recommendations course_id title u ser _ id co u rse _ id rating description programming_language 1 1 4.8 1 74 4.78 1 21 4.5 Rating user_id 2 32 4.9 course_id rating INTRODUCTION TO DATA ENGINEERING

  12. A v erage co u rse ratings A v erage co u rse rating co u rse _ id a v g _ rating 1 4.8 74 4.78 21 4.5 32 4.9 We w ant to recommend highl y rated co u rses INTRODUCTION TO DATA ENGINEERING

  13. Use the right programming lang u age Rating u ser _ id co u rse _ id programming _ lang u age rating 1 1 r 4.8 1 74 sql 4.78 1 21 sql 4.5 1 32 p y thon 4.9 Recommend SQL co u rse for u ser w ith id 1 INTRODUCTION TO DATA ENGINEERING

  14. Recommend ne w co u rses Rating u ser _ id co u rse _ id programming _ lang u age rating 1 1 r 4.8 1 74 sql 4.78 1 21 sql 4.5 1 32 p y thon 4.9 Don ' t recommend the combinations alread y in the rating table INTRODUCTION TO DATA ENGINEERING

  15. O u r recommendation transformation Use technolog y that u ser has rated most Don ' t recommend co u rses that u ser alread y rated Recommend three highest rated co u rses from remaining combinations INTRODUCTION TO DATA ENGINEERING

  16. Rating u ser _ id co u rse _ id programming _ lang u age rating 1 12 sql 4.78 1 52 sql 4.5 1 32 r 4.9 Recommend three highest rated SQL co u rses w hich are not 12 and 52. INTRODUCTION TO DATA ENGINEERING

  17. Let ' s practice ! IN TR OD U C TION TO DATA E N G IN E E R IN G

  18. Sched u ling dail y jobs IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp

  19. What y o u'v e done so far E x tract u sing extract_course_data() and extract_rating_data() Clean u p u sing NA u sing transform_fill_programming_language() A v erage co u rse ratings per co u rse : transform_avg_rating() Get eligible u ser and co u rse id pairs : transform_courses_to_recommend() Calc u late the recommendations : transform_recommendations() INTRODUCTION TO DATA ENGINEERING

  20. Loading to Postgres Use the calc u lations in data prod u cts Update dail y E x ample u se case : sending o u t e - mails w ith recommendations INTRODUCTION TO DATA ENGINEERING

  21. The loading phase recommendations.to_sql( "recommendations", db_engine, if_exists="append", ) INTRODUCTION TO DATA ENGINEERING

  22. def etl(db_engines): # Extract the data courses = extract_course_data(db_engines) rating = extract_rating_data(db_engines) # Clean up courses data courses = transform_fill_programming_language(courses) # Get the average course ratings avg_course_rating = transform_avg_rating(rating) # Get eligible user and course id pairs courses_to_recommend = transform_courses_to_recommend( rating, courses, ) # Calculate the recommendations recommendations = transform_recommendations( avg_course_rating, courses_to_recommend, ) # Load the recommendations into the database load_to_dwh(recommendations, db_engine)) INTRODUCTION TO DATA ENGINEERING

  23. Creating the DAG from airflow.models import DAG from airflow.operators.python_operator import PythonOperator dag = DAG(dag_id="recommendations", scheduled_interval="0 0 * * *") task_recommendations = PythonOperator( task_id="recommendations_task", python_callable=etl, ) INTRODUCTION TO DATA ENGINEERING

  24. Let ' s practice ! IN TR OD U C TION TO DATA E N G IN E E R IN G

  25. Congrat u lations IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp

  26. Introd u ction to data engineering Identif y the tasks of a data engineer What kind of tools the y u se Clo u d ser v ice pro v iders INTRODUCTION TO DATA ENGINEERING

  27. Data engineering toolbo x Databases Parallel comp u ting & frame w orks ( Spark ) Work � o w sched u ling w ith Air � o w INTRODUCTION TO DATA ENGINEERING

  28. E x tract , Load and Transform ( ETL ) E x tract : get data from se v eral so u rces Transform : perform transformations u sing parallel comp u ting Load : load data into target database INTRODUCTION TO DATA ENGINEERING

  29. Case st u d y: DataCamp Fetch data from m u ltiple so u rces Transform to form recommendations Load into target database INTRODUCTION TO DATA ENGINEERING

  30. Good job ! IN TR OD U C TION TO DATA E N G IN E E R IN G

Recommend


More recommend