WORKSHOP: REAL LIVE DATA FOR CS COURSES
CCSC:NE 2018 @ UNIVERSITY OF NEW HAMPSHIRE NADEEM ABDUL HAMID
WORKSHOP: REAL LIVE DATA FOR CS COURSES CCSC:NE 2018 @ UNIVERSITY - - PowerPoint PPT Presentation
WORKSHOP: REAL LIVE DATA FOR CS COURSES CCSC:NE 2018 @ UNIVERSITY OF NEW HAMPSHIRE NADEEM ABDUL HAMID REAL LIVE DATA FOR CS COURSES WARM-UP http://services.faa.gov/docs/services/airport/ REAL LIVE DATA FOR CS COURSES HELLO WORLD /
CCSC:NE 2018 @ UNIVERSITY OF NEW HAMPSHIRE NADEEM ABDUL HAMID
REAL LIVE DATA FOR CS COURSES
WARM-UP
http://services.faa.gov/docs/services/airport/
REAL LIVE DATA FOR CS COURSES
“HELLO WORLD” / SINBAD — JAVA
import core.data.*; public class Warmup { public static void main(String[] args) { DataSource ds = DataSource.connect(“http://services.faa.gov” + “/airport/status/MHT?format=application/xml"); ds.load(); System.out.println(ds.fetchString(“Name")); System.out.println(ds.fetchString("Status/Reason")); System.out.println(ds.fetchString("Weather/Temp")); } }
REAL LIVE DATA FOR CS COURSES
“HELLO WORLD” / SINBAD — PYTHON
from sinbad import * ds = Data_Source.connect(“http://services.faa.gov/airport/" + “status/MHT?format=application/xml") ds.load() print(ds.fetch("Name", "Status/Reason", "Weather/Temp"))
REAL LIVE DATA FOR CS COURSES
“HELLO WORLD” / SINBAD — RACKET
(require sinbad) (define ds (sail-to "http://services.faa.gov/airport/status/MHT? format=application/xml" (manifest) (load))) (fetch ds "Name" "Status/Reason" "Weather/Temp")
REAL LIVE DATA FOR CS COURSES
TUTORIAL PLAN
▸ Welcome; Intros [5 mins.] ▸ SINBAD: Overview [15 mins.] ▸ Q&A ▸ Install & Setup [10 mins.] ▸ Quick Start Tutorials [40 mins.] ▸ (regroup/break - 10 mins.) ▸ Data Source Exploration [40 mins.]
Course Activity Generation
▸ (regroup/break - 10 mins.) ▸ Discussion • Feedback [30 mins.] ▸ Wrap up [15 mins.]
REAL LIVE DATA FOR CS COURSES
MOTIVATION
▸ The “Age of Big Data” ▸ Incorporate the use of online data sets in introductory
programming courses (CS1/2)
▸ Provide a simple interface ▸ Hide I/O connection, parsing, extracting, data binding
REAL LIVE DATA FOR CS COURSES
REAL LIVE DATA FOR CS COURSES
▸ Motivate & engage: Make it really “real”
▸ Direct access via URL, e.g. “https://data.baltimorecity.gov/Culture-Arts/
Restaurants/k5ry-ef3g/”
▸ Live data (e.g. current weather conditions, airport status, currency
exchange rates)
▸ Support open-ended assignments/projects (students can choose data
sources)
▸ Target learning outcomes
▸ Easily access atomic data/collections of atomic data ▸ Binding happens automagically for structured data ▸ Students decide/control data representation ▸ Grade functions, not data access
▸ Don’t use up course time: Make it smooth
▸ Syntax <= Scanner (e.g. in Java) ▸ Transparently parse, cache, convert data ▸ Seamlessly integrate with course assignments
REAL LIVE DATA FOR CS COURSES
APPROACH
▸ Infer necessary information about the data source
automatically
▸ Data format, schema, field names, etc.
▸ Provide transparent local caching ▸ Sampling facilities for testing & development ▸ Provide runtime binding to data structures/classes in the
user code
▸ Deal as gracefully as possible with messy data (e.g.
missing fields)
REAL LIVE DATA FOR CS COURSES
USAGE
▸ 3-step approach: • Connect • Load • Fetch ▸ Infer data format if possible — XML, CSV, JSON ▸ Display inferred structure of data ▸ Fetching atomic values ▸ provide a path into the data ▸ Structured data: ▸ provide multiple paths of data
supplied to the class/struct constructor (Java/Racket)
▸ Collections: lists / arrays (Java)
REAL LIVE DATA FOR CS COURSES
OTHER FUNCTIONALITY
▸ Data source specifications ▸ Query parameters ▸ Cache control ▸ Processing support
REAL LIVE DATA FOR CS COURSES
SOME EXPERIENCE
▸ CSC 103: Creative Computing (@Berry College) ▸ Processing-based; Tutorial-style labs ▸ Sample data sets used/discovered by students: ▸ Students get engaged and excited!
Name Source Type Records (Asterisk indicates data set discovered by students) *1000 songs to hear before you die
XML 1,000 Abalone data set UCI Machine Learning Repository CSV 4,177 *Airport Weather Mashup NWS + FAA XML fixed *Chicago life expectancy by community data.cityofchicago.org XML ˜80 Earthquake feeds US Geological Survey JSON variable *Fuel economy data US EPA XML 35,430 *Jeopardy! question archive reddit JSON 216,930 Live auction data Ebay XML 100/page Magic the Gathering card data mtgjson.com JSON variable Microfinance loan data Kiva XML variable *SEC Rushing Leaders 2014 ESPN CSV (manual) variable
REAL LIVE DATA FOR CS COURSES
MORE EXPERIENCE
▸ “Designing Programs: Problem-solving & Abstraction”
▸ CS1 based on How to Design Programs (htdp.org) ▸ Functional, using Racket-lite ▸ Sinbad integrated in labs/assignments throughout semester (first time
in progress)
▸ Actual coding exercises the same, just tie in to data
“I really enjoy the variety and creativity of the different problems that we do, both in class and on the homework.” “...I really enjoy the assignments we have because we’re applying what we’re learning to real things, like weather, …” “...I really like the type of data we’re analyzing in the labs and assignments because it’s realistic…”
Experience - CSC120
○ CS1 based on How to Design Programs (htdp.org) ○ Functional, using Racket-lite ○ Sinbad integrated in labs/assignments throughout semester (first time in progress) ○ Actual coding exercises the same, just tie in to data
Really like Neither like nor dislike
“I really enjoy the variety and creativity of the different problems that we do, both in class and on the homework.” “...I really enjoy the assignments we have because we’re applying what we’re learning to real things, like weather, …” “...I really like the type of data we’re analyzing in the labs and assignments because it’s realistic…” What features of the course do you like or dislike?
REAL LIVE DATA FOR CS COURSES
Sample Activities/Data Sources
○ Generate an image from data ○ Load and compare two locations ○ Interactive statewide weather conditions app
○ Arithmetic, conditionals ■ Compute usage fees based on type of user ■ Latitude/longitude computations ○ Representing structured data, building strings ■ Generate Google map URL from a trip record ○ Simple list algorithms ■ What age is the youngest rider in the data? The oldest? ○ Filtering, aggregation ■ If we arrange riders into 5-year age ranges which range do the most riders fall into?
REAL LIVE DATA FOR CS COURSES
Q & A
ACTIVITY
Go to: http://cs.berry.edu/sinbad Follow Installation instructions for your language. Follow Quick Start link for your language end at ~10:10am
REAL LIVE DATA FOR CS COURSES
DATA SOURCE EXPLORATION • COURSE ACTIVITY GENERATION
▸ Data Bazaar @ cs.berry.edu/sinbad: Curated data sources + sample code ▸ Let’s break into groups (2/3/4?) ▸ Pick a data source (Kiva, Divvy, …) & a language ▸ Explore the data source website ▸ Look at sample Sinbad code for your language ▸ Pick one or several programming competencies (handout) ▸ Develop a course assignment - i.e. write solution code if possible ▸ Stop @ ~11:00am to share & discuss.
REAL LIVE DATA FOR CS COURSES
CHALLENGES
▸ Students understanding data formats/representations (HTML vs XML) ▸ Data is messy ▸ Maintaining balance - avoid students getting lost in data ▸ Data exploration/browsing ▸ Error handling/debugging ▸ Developer/API registration ▸ Repository of assignments, labs, examples ▸ Evaluation
19
REAL LIVE DATA FOR CS COURSES
RELATED WORK
▸ CORGIS Dataset Project - http://think.cs.vt.edu/corgis/ ▸ Great tips from experience:
https://think.cs.vt.edu/pragmatics/
▸ BRIDGES Project - http://bridgesuncc.github.io/
20
REAL LIVE DATA FOR CS COURSES
CONCLUSION
▸ Facilitate incorporation of online data sources into
programming assignments
▸ Painlessly ▸ Seamlessly
21
cs.berry.edu/sinbad
Use a data set in your next assignment!
REAL LIVE DATA FOR CS COURSES
DATA SOURCE SPECIFICATION FILE
with URL to a project or informational page about the data source.
supplied (required and optional) query parameters or path parameters. The latter are user-provided strings that are substituted in for placeholders in the URL path.
particular data source object (such as a header for CSV files).
structures and fields from the source with various helpful annotations such as textual descriptions of fields that can be displayed by printUsageString().
{ "name": "Geographical Data - Peru", "format": "TSV", "path": "http://download.geonames.org/export/dump/PE.zip", "infourl": "http://www.geonames.org/", "options": [ { "name": "fileentry", "value": "PE.txt" }, { "name": "header", "value": "geoid,name,asciiname,altnames,lat,long,feature-class,feature- code,cc,cc2,admin1,admin2,admin3,admin4,pop,elev,dem,tz,mod" }], }
DataSource.connectUsing("geospec-pe.spec");
TEACHING CS COURSES WITH REAL LIVE DATA
FUTURE
▸ GUI tools ▸ Additional data formats ▸ HTML tables, web scrapers (regexps) ▸ Customized for popular APIs (ebay, twitter, etc.) ▸ Streaming, pagination, sampling… ▸ More curriculum resources (CS1/2) ▸ Evaluation of effectiveness
BART, ET AL. FIGURE 2
import java . u t i l . List ; import java . u t i l . HashSet ; import realtimeweb . earthquakeservice . main . EarthquakeService ; import realtimeweb . earthquakeservice . domain . Earthquake ; public class EarthquakeDemo { public static void main ( String [ ] args ) throws EarthquakeException { // Use the EarthquakeService l i b r a r y EarthquakeService es = EarthquakeService . getInstance ( ) ; es . connect ( ) ; // Remove to use the l o c a l cache // 5 minute delay , but i f we use the cache no delay i s needed ! int DELAY = 5 ∗ 60 ∗ 1000; HashSet<Earthquake> seenQuakes = new HashSet<Earthquake >(); // Poll s e r v i c e r e g u l a r l y while ( true ) { // Get a l l earthquakes in the past hour List <Earthquake> l a t e s t = es . getEarthquakes ( History .ALL) ; // Check i f t h i s i s a new earthquake for ( Earthquake e : l a t e s t ) { i f ( ! seenQuakes . contains ( e )) { // Report new earthquakes System . out . p r i n t l n ( ”New quake ! ” ) ; seenQuakes . add ( e ) ; } } // Delay to avoid spamming the weather s e r v i c e Thread . s l e e p (DELAY) ; } } }
EQUIVALENT
import core.data.*; import java.util.Date; import java.util.HashSet; import java.util.List; public class EarthquakeDemo { public static void main(String[] args) { int DELAY = 5; // 5 minute cache delay DataSource ds = DataSource.connectAs("JSON", "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.geojson"); ds.setCacheTimeout(DELAY); ds.load(); ds.printUsageString(); HashSet<Earthquake> quakes = new HashSet<Earthquake>(); while (true) { ds.load(); // this only actually reloads data when the cache times out List<Earthquake> latest = ds.fetchList("Earthquake", "features/properties/title", "features/properties/time", "features/properties/mag", "features/properties/url"); for (Earthquake e : latest) { if (!quakes.contains(e)) { System.out.println("New quake!... " + e.description + " (" + e.date() + ") info at: " + e.url); quakes.add(e); } } } } }
PLUS…
// this class may be instructor-provided, or left to students to define as an exercise class Earthquake { String description; long timestamp; float magnitude; String url; public Earthquake(String description, long timestamp, float magnitude, String url) { this.description = description; this.timestamp = timestamp; this.magnitude = magnitude; this.url = url; } public Date date() { return new Date(timestamp); } public boolean equals(Object o) { // introductory CS students would probably implement a simpler version of this if (o.getClass() != this.getClass()) return false; Earthquake that = (Earthquake) o; return that.description.equals(this.description) && that.timestamp == this.timestamp && that.magnitude == this.magnitude; } public int hashCode() { // technically, hashCode() should be overridden if equals() is return (int) (31 * (31 * this.description.hashCode() + this.timestamp) + this.magnitude); } }
OUTPUT (1)
The following data is available: a structure with fields: { type : * metadata : a structure with fields: { api : * count : * generated : * status : * title : * url : * } features : A list of: structures with fields: { id : * type : * geometry : a structure with fields: { type : * coordinates : A list of: * } properties : a structure with fields: { alert : * cdi : * code : * ... type : * types : * tz : * updated : * url : * } } }
OUTPUT (2)
New quake!... M 0.8 - 3km WSW of Tahoe Vista, California (Wed Jul 06 21:50:36 EDT 2016) info at: http://earthquake.usgs.gov/earthquakes/eventpage/nn00550830 New quake!... M 1.2 - 9km WNW of Cobb, California (Wed Jul 06 21:44:22 EDT 2016) info at: http:// earthquake.usgs.gov/earthquakes/eventpage/nc72659116 New quake!... M 1.2 - 47km E of Cape Yakataga, Alaska (Wed Jul 06 21:37:45 EDT 2016) info at: http://earthquake.usgs.gov/earthquakes/eventpage/ak13742485 New quake!... M 1.5 - 3km WSW of Tahoe Vista, California (Wed Jul 06 21:30:39 EDT 2016) info at: http://earthquake.usgs.gov/earthquakes/eventpage/nc72659111 New quake!... M 2.0 - 90km N of Redoubt Volcano, Alaska (Wed Jul 06 21:16:04 EDT 2016) info at: http://earthquake.usgs.gov/earthquakes/eventpage/ak13742480