WORKSHOP: REAL LIVE DATA FOR CS COURSES CCSC:NE 2018 @ UNIVERSITY OF NEW HAMPSHIRE NADEEM ABDUL HAMID
REAL LIVE DATA FOR CS COURSES WARM-UP http://services.faa.gov/docs/services/airport/
REAL LIVE DATA FOR CS COURSES “HELLO WORLD” / SINBAD — JAVA import core.data.*; public class Warmup { public static void main(String[] args) { DataSource ds = DataSource.connect( “http://services.faa.gov” + “/airport/status/MHT?format=application/xml" ); ds.load(); System.out.println(ds.fetchString(“Name")); System.out.println(ds.fetchString("Status/Reason")); System.out.println(ds.fetchString("Weather/Temp")); } }
REAL LIVE DATA FOR CS COURSES “HELLO WORLD” / SINBAD — PYTHON from sinbad import * ds = Data_Source.connect(“ http://services.faa.gov/airport/" + “status/MHT ?format=application/xml") ds.load() print(ds.fetch("Name", "Status/Reason", "Weather/Temp"))
REAL LIVE DATA FOR CS COURSES “HELLO WORLD” / SINBAD — RACKET (require sinbad) (define ds (sail-to "http://services.faa.gov/airport/status/MHT? format=application/xml" (manifest) (load))) (fetch ds "Name" "Status/Reason" "Weather/Temp")
REAL LIVE DATA FOR CS COURSES TUTORIAL PLAN ▸ Welcome; Intros [5 mins.] ▸ SINBAD: Overview [15 mins.] ▸ Q&A ▸ Install & Setup [10 mins.] ▸ Quick Start Tutorials [40 mins.] ▸ (regroup/break - 10 mins.) ▸ Data Source Exploration [40 mins.] Course Activity Generation ▸ (regroup/break - 10 mins.) ▸ Discussion • Feedback [30 mins.] ▸ Wrap up [15 mins.]
REAL LIVE DATA FOR CS COURSES MOTIVATION ▸ The “Age of Big Data” ▸ Incorporate the use of online data sets in introductory programming courses (CS1/2) ▸ Provide a simple interface ▸ Hide I/O connection, parsing, extracting, data binding
REAL LIVE DATA FOR CS COURSES REAL LIVE DATA FOR CS COURSES ▸ Motivate & engage: Make it really “real” ▸ Direct access via URL, e.g. “ https://data.baltimorecity.gov/Culture-Arts/ Restaurants/k5ry-ef3g/ ” ▸ Live data (e.g. current weather conditions, airport status, currency exchange rates) ▸ Support open-ended assignments/projects (students can choose data sources) ▸ Target learning outcomes ▸ Easily access atomic data/collections of atomic data ▸ Binding happens automagically for structured data ▸ Students decide/control data representation ▸ Grade functions, not data access ▸ Don’t use up course time: Make it smooth ▸ Syntax <= Scanner (e.g. in Java) ▸ Transparently parse, cache, convert data ▸ Seamlessly integrate with course assignments
REAL LIVE DATA FOR CS COURSES APPROACH ▸ Infer necessary information about the data source automatically ▸ Data format, schema, field names, etc. ▸ Provide transparent local caching ▸ Sampling facilities for testing & development ▸ Provide runtime binding to data structures/classes in the user code ▸ Deal as gracefully as possible with messy data (e.g. missing fields)
REAL LIVE DATA FOR CS COURSES USAGE ▸ 3-step approach: • Connect • Load • Fetch ▸ Infer data format if possible — XML, CSV, JSON ▸ Display inferred structure of data ▸ Fetching atomic values ▸ provide a path into the data ▸ Structured data: ▸ provide multiple paths of data supplied to the class/struct constructor (Java/Racket) ▸ Collections: lists / arrays (Java)
REAL LIVE DATA FOR CS COURSES OTHER FUNCTIONALITY ▸ Data source specifications ▸ Query parameters ▸ Cache control ▸ Processing support
REAL LIVE DATA FOR CS COURSES SOME EXPERIENCE ▸ CSC 103: Creative Computing (@Berry College) ▸ Processing-based; Tutorial-style labs ▸ Sample data sets used/discovered by students: Name Source Type Records (Asterisk indicates data set discovered by students) *1000 songs to hear before you die opendata.socrata.com XML 1,000 Abalone data set UCI Machine Learning Repository CSV 4,177 *Airport Weather Mashup NWS + FAA XML fixed *Chicago life expectancy by community data.cityofchicago.org XML ˜80 Earthquake feeds US Geological Survey JSON variable *Fuel economy data US EPA XML 35,430 *Jeopardy! question archive reddit JSON 216,930 Live auction data Ebay XML 100/page Magic the Gathering card data mtgjson.com JSON variable Microfinance loan data Kiva XML variable *SEC Rushing Leaders 2014 ESPN CSV (manual) variable ▸ Students get engaged and excited!
REAL LIVE DATA FOR CS COURSES MORE EXPERIENCE ▸ “Designing Programs: Problem-solving & Abstraction” Experience - CSC120 ▸ CS1 based on How to Design Programs (htdp.org) ▸ Functional, using Racket-lite ● “Designing Programs: Problem-solving & Abstraction” ▸ Sinbad integrated in labs/assignments throughout semester (first time ○ CS1 based on How to Design Programs ( htdp.org ) in progress) ○ Functional, using Racket-lite ▸ Actual coding exercises the same, just tie in to data ○ Sinbad integrated in labs/assignments throughout semester (first time in progress) ○ Actual coding exercises the same, just tie in to data “I really enjoy the variety and creativity of the What features of the course different problems that we do, both in class and on Really like Neither like the homework.” “I really enjoy the variety and creativity of the different problems that nor dislike do you like or dislike? we do, both in class and on the homework.” “...I really enjoy the assignments we have because we’re applying what we’re learning to real things, “...I really enjoy the assignments we have because we’re applying what we’re learning to real things, like weather, …” like weather, … ” “...I really like the type of data we’re analyzing in the labs and “...I really like the type of data we’re analyzing in assignments because it’s realistic…” the labs and assignments because it’s realistic … ”
REAL LIVE DATA FOR CS COURSES Sample Activities/Data Sources ● Current weather conditions (NWS) ○ Generate an image from data ○ Load and compare two locations ○ Interactive statewide weather conditions app ● Bike Share Data (NYC, Chicago) ○ Arithmetic, conditionals ■ Compute usage fees based on type of user ■ Latitude/longitude computations ○ Representing structured data, building strings ■ Generate Google map URL from a trip record ○ Simple list algorithms ■ What age is the youngest rider in the data? The oldest? ○ Filtering, aggregation ■ If we arrange riders into 5-year age ranges which range do the most riders fall into?
REAL LIVE DATA FOR CS COURSES Q & A ?
Go to: http://cs.berry.edu/sinbad Follow Installation instructions for your language. Follow Quick Start link for your language ACTIVITY QUICK START TUTORIAL end at ~10:10am
REAL LIVE DATA FOR CS COURSES DATA SOURCE EXPLORATION • COURSE ACTIVITY GENERATION ▸ Data Bazaar @ cs.berry.edu/sinbad: Curated data sources + sample code ▸ Let’s break into groups (2/3/4?) ▸ Pick a data source (Kiva, Divvy, …) & a language ▸ Explore the data source website ▸ Look at sample Sinbad code for your language ▸ Pick one or several programming competencies (handout) ▸ Develop a course assignment - i.e. write solution code if possible ▸ Stop @ ~11:00am to share & discuss.
DISCUSSION / FEEDBACK
� 19 REAL LIVE DATA FOR CS COURSES CHALLENGES ▸ Students understanding data formats/representations (HTML vs XML) ▸ Data is messy ▸ Maintaining balance - avoid students getting lost in data ▸ Data exploration/browsing ▸ Error handling/debugging ▸ Developer/API registration ▸ Repository of assignments, labs, examples ▸ Evaluation
� 20 REAL LIVE DATA FOR CS COURSES RELATED WORK ▸ CORGIS Dataset Project - http://think.cs.vt.edu/corgis/ ▸ Great tips from experience: https://think.cs.vt.edu/pragmatics/ ▸ BRIDGES Project - http://bridgesuncc.github.io/
� 21 REAL LIVE DATA FOR CS COURSES CONCLUSION ▸ Facilitate incorporation of online data sources into programming assignments ▸ Painlessly ▸ Seamlessly
Use a data set in your next assignment! cs.berry.edu/sinbad
Recommend
More recommend