COMPARISON OF CATEGORICAL PROPERTIES OFFERED BY MULTIPLE MOOC PLATFORMS Using automated Web Crawler in Python with Scrapy Bachelor Thesis - Introduction Presentation Louis Mbuyu Aufgabensteller: Prof. Dr. François Bry Betreuer: Prof. Dr. François Bry, Yingding Wang 25.01.18 1
AGENDA 1. Motivation 2. Research topic 3. Project Plan 4. Technical Details 5. Challenges 6. Demo 2
1. Motivation 3
Motivation • Irom - I ntelligent R ecommender O f M OOCs • MOOC - M assive O pen O nline C ourse The goal of Irom • To improve the learning and studying at the university. • To develop an intelligent MOOCs search engine My Motivation • Define unified categorical set across all MOOC platforms. 4
Motivation Link: https://irom.pms.ifi.lmu.de/#/home 5
Motivation - MOOC MOOC (Massive Open Online Course) M assive - Unlimited learners O pen - No requirements O nline - Open access via the web C ourse - Filmed lectures/Videos, Readings 6
Motivation - Popular MOOC platforms 7
Motivation - MOOC platforms by size Coursera FutureLearn ca. 5.000 ca. 1.000 Udacity ca. 200 Udemy Edx ca. 40.000 Open2Study ca. 2.000 ca. 2.000 8
Motivation - Diverse categories 9
Motivation - Behind my research question Unified categorical set across all the platforms to allow users to browse through the categories on Irom 10
Motivation - Advantages • Browse & create new subcategories e.g.: “Top Courses” 11
Motivation - Advantages • Easier to recommend similar courses 12
2. Research Topic 13
Research Topic COMPARISON OF CATEGORICAL PROPERTIES OFFERED BY MULTIPLE MOOC PLATFORMS • Tasks: • Define unified MOOC Model. • Web crawl 6 platforms and extract ca. 40.000 courses. • Unified Categorical set across all platforms 14
Research Topic - Unified MOOC Model { “title”: String, “courseUrl”: String, “imageUrl”: String, “description”: String, “duration”: Int, “category”: String, … } 15
Research Topic { course 1 “title”: String, Courses “courseUrl”: String, Data Science “imageUrl”: String, “description”: String, “duration”: Int, “category”: String } { course 2 “title”: String, Most n occurring words “courseUrl”: String, “imageUrl”: String, “description”: String, [ Data, “duration”: Int, “category”: String } Science, { course 3 “title”: String, Machine, “courseUrl”: String, “imageUrl”: String, “description”: String, “duration”: Int, Learning, “category”: String } Python, { course m “title”: String, “courseUrl”: String, “imageUrl”: String, …, “description”: String, “duration”: Int, “category”: String n ] } 16
Research Topic New course from Udacity Data Science Most n occurring in Most n occurring words description words [ Data , [ Data , Compare Science , Science , Machine , Machine , Learning, Driving, Python, Car, …, …, n ] n ] 17
3. Project Plan 18
Project Plan - Timeline Build web Analyse of Compare & Deadline crawlers categories Evaluate Categorical Crawler for 6 Remove stop properties. platforms. words. Define unified Define unified most occurring categorical MOOC model. words. set. Dec 17 Jan 18 Feb 18 12 Mar 18 19
4. Technical details 20
Technical Details 21
5. Challenges 22
Challenges • Web crawling websites with Javascript • Defining unified MOOC model • Websites changing their layout 23
24
25
6. Quick Demo of web crawling 26
27
Recommend
More recommend