WORKSHOP: REAL LIVE DATA FOR CS COURSES CCSC:NE 2018 @ UNIVERSITY - - PowerPoint PPT Presentation

workshop real live data for cs courses
SMART_READER_LITE
LIVE PREVIEW

WORKSHOP: REAL LIVE DATA FOR CS COURSES CCSC:NE 2018 @ UNIVERSITY - - PowerPoint PPT Presentation

WORKSHOP: REAL LIVE DATA FOR CS COURSES CCSC:NE 2018 @ UNIVERSITY OF NEW HAMPSHIRE NADEEM ABDUL HAMID REAL LIVE DATA FOR CS COURSES WARM-UP http://services.faa.gov/docs/services/airport/ REAL LIVE DATA FOR CS COURSES HELLO WORLD /


slide-1
SLIDE 1

WORKSHOP: REAL LIVE DATA
 FOR CS COURSES

CCSC:NE 2018 @ UNIVERSITY OF NEW HAMPSHIRE NADEEM ABDUL HAMID

slide-2
SLIDE 2

REAL LIVE DATA FOR CS COURSES

WARM-UP

http://services.faa.gov/docs/services/airport/

slide-3
SLIDE 3

REAL LIVE DATA FOR CS COURSES

“HELLO WORLD” / SINBAD — JAVA

import core.data.*; public class Warmup { public static void main(String[] args) { DataSource ds = DataSource.connect(“http://services.faa.gov” + “/airport/status/MHT?format=application/xml"); ds.load(); System.out.println(ds.fetchString(“Name")); System.out.println(ds.fetchString("Status/Reason")); System.out.println(ds.fetchString("Weather/Temp")); } }

slide-4
SLIDE 4

REAL LIVE DATA FOR CS COURSES

“HELLO WORLD” / SINBAD — PYTHON

from sinbad import * ds = Data_Source.connect(“http://services.faa.gov/airport/" + “status/MHT?format=application/xml") ds.load() print(ds.fetch("Name", "Status/Reason", "Weather/Temp"))

slide-5
SLIDE 5

REAL LIVE DATA FOR CS COURSES

“HELLO WORLD” / SINBAD — RACKET

(require sinbad) (define ds (sail-to "http://services.faa.gov/airport/status/MHT? format=application/xml" (manifest) (load))) (fetch ds "Name" "Status/Reason" "Weather/Temp")

slide-6
SLIDE 6

REAL LIVE DATA FOR CS COURSES

TUTORIAL PLAN

▸ Welcome; Intros [5 mins.] ▸ SINBAD: Overview [15 mins.] ▸ Q&A ▸ Install & Setup [10 mins.] ▸ Quick Start Tutorials [40 mins.] ▸ (regroup/break - 10 mins.) ▸ Data Source Exploration [40 mins.]


Course Activity Generation

▸ (regroup/break - 10 mins.) ▸ Discussion • Feedback [30 mins.] ▸ Wrap up [15 mins.]

slide-7
SLIDE 7

REAL LIVE DATA FOR CS COURSES

MOTIVATION

▸ The “Age of Big Data” ▸ Incorporate the use of online data sets in introductory

programming courses (CS1/2)

▸ Provide a simple interface ▸ Hide I/O connection, parsing, extracting, data binding

slide-8
SLIDE 8

REAL LIVE DATA FOR CS COURSES

REAL LIVE DATA FOR CS COURSES

▸ Motivate & engage: Make it really “real”

▸ Direct access via URL, e.g. “https://data.baltimorecity.gov/Culture-Arts/

Restaurants/k5ry-ef3g/”

▸ Live data (e.g. current weather conditions, airport status, currency

exchange rates)

▸ Support open-ended assignments/projects (students can choose data

sources)

▸ Target learning outcomes

▸ Easily access atomic data/collections of atomic data ▸ Binding happens automagically for structured data ▸ Students decide/control data representation ▸ Grade functions, not data access

▸ Don’t use up course time: Make it smooth

▸ Syntax <= Scanner (e.g. in Java) ▸ Transparently parse, cache, convert data ▸ Seamlessly integrate with course assignments

slide-9
SLIDE 9

REAL LIVE DATA FOR CS COURSES

APPROACH

▸ Infer necessary information about the data source

automatically

▸ Data format, schema, field names, etc.

▸ Provide transparent local caching ▸ Sampling facilities for testing & development ▸ Provide runtime binding to data structures/classes in the

user code

▸ Deal as gracefully as possible with messy data (e.g.

missing fields)

slide-10
SLIDE 10

REAL LIVE DATA FOR CS COURSES

USAGE

▸ 3-step approach: • Connect • Load • Fetch ▸ Infer data format if possible — XML, CSV, JSON ▸ Display inferred structure of data ▸ Fetching atomic values ▸ provide a path into the data ▸ Structured data: ▸ provide multiple paths of data


supplied to the class/struct constructor (Java/Racket)

▸ Collections: lists / arrays (Java)

slide-11
SLIDE 11

REAL LIVE DATA FOR CS COURSES

OTHER FUNCTIONALITY

▸ Data source specifications ▸ Query parameters ▸ Cache control ▸ Processing support

slide-12
SLIDE 12

REAL LIVE DATA FOR CS COURSES

SOME EXPERIENCE

▸ CSC 103: Creative Computing (@Berry College) ▸ Processing-based; Tutorial-style labs ▸ Sample data sets used/discovered by students: ▸ Students get engaged and excited!

Name Source Type Records (Asterisk indicates data set discovered by students) *1000 songs to hear before you die

  • pendata.socrata.com

XML 1,000 Abalone data set UCI Machine Learning Repository CSV 4,177 *Airport Weather Mashup NWS + FAA XML fixed *Chicago life expectancy by community data.cityofchicago.org XML ˜80 Earthquake feeds US Geological Survey JSON variable *Fuel economy data US EPA XML 35,430 *Jeopardy! question archive reddit JSON 216,930 Live auction data Ebay XML 100/page Magic the Gathering card data mtgjson.com JSON variable Microfinance loan data Kiva XML variable *SEC Rushing Leaders 2014 ESPN CSV (manual) variable

slide-13
SLIDE 13

REAL LIVE DATA FOR CS COURSES

MORE EXPERIENCE

▸ “Designing Programs: Problem-solving & Abstraction”

▸ CS1 based on How to Design Programs (htdp.org) ▸ Functional, using Racket-lite ▸ Sinbad integrated in labs/assignments throughout semester (first time

in progress)

▸ Actual coding exercises the same, just tie in to data

“I really enjoy the variety and creativity of the different problems that we do, both in class and on the homework.” “...I really enjoy the assignments we have because we’re applying what we’re learning to real things, like weather, …” “...I really like the type of data we’re analyzing in the labs and assignments because it’s realistic…”

Experience - CSC120

  • “Designing Programs: Problem-solving & Abstraction”

○ CS1 based on How to Design Programs (htdp.org) ○ Functional, using Racket-lite ○ Sinbad integrated in labs/assignments throughout semester (first time in progress) ○ Actual coding exercises the same, just tie in to data

Really like Neither like nor dislike

“I really enjoy the variety and creativity of the different problems that we do, both in class and on the homework.” “...I really enjoy the assignments we have because we’re applying what we’re learning to real things, like weather, …” “...I really like the type of data we’re analyzing in the labs and assignments because it’s realistic…” What features of the course do you like or dislike?

slide-14
SLIDE 14

REAL LIVE DATA FOR CS COURSES

Sample Activities/Data Sources

  • Current weather conditions (NWS)

○ Generate an image from data ○ Load and compare two locations ○ Interactive statewide weather conditions app

  • Bike Share Data (NYC, Chicago)

○ Arithmetic, conditionals ■ Compute usage fees based on type of user ■ Latitude/longitude computations ○ Representing structured data, building strings ■ Generate Google map URL from a trip record ○ Simple list algorithms ■ What age is the youngest rider in the data? The oldest? ○ Filtering, aggregation ■ If we arrange riders into 5-year age ranges which range do the most riders fall into?

slide-15
SLIDE 15

REAL LIVE DATA FOR CS COURSES

Q & A

?

slide-16
SLIDE 16

QUICK START TUTORIAL

ACTIVITY

Go to: http://cs.berry.edu/sinbad Follow Installation instructions for your language. Follow Quick Start link for your language end at ~10:10am

slide-17
SLIDE 17

REAL LIVE DATA FOR CS COURSES

DATA SOURCE EXPLORATION • COURSE ACTIVITY GENERATION

▸ Data Bazaar @ cs.berry.edu/sinbad: Curated data sources + sample code ▸ Let’s break into groups (2/3/4?) ▸ Pick a data source (Kiva, Divvy, …) & a language ▸ Explore the data source website ▸ Look at sample Sinbad code for your language ▸ Pick one or several programming competencies (handout) ▸ Develop a course assignment - i.e. write solution code if possible ▸ Stop @ ~11:00am to share & discuss.

slide-18
SLIDE 18

DISCUSSION / FEEDBACK

slide-19
SLIDE 19

REAL LIVE DATA FOR CS COURSES

CHALLENGES

▸ Students understanding data formats/representations (HTML vs XML) ▸ Data is messy ▸ Maintaining balance - avoid students getting lost in data ▸ Data exploration/browsing ▸ Error handling/debugging ▸ Developer/API registration ▸ Repository of assignments, labs, examples ▸ Evaluation

19

slide-20
SLIDE 20

REAL LIVE DATA FOR CS COURSES

RELATED WORK

▸ CORGIS Dataset Project - http://think.cs.vt.edu/corgis/ ▸ Great tips from experience:


https://think.cs.vt.edu/pragmatics/

▸ BRIDGES Project - http://bridgesuncc.github.io/

20

slide-21
SLIDE 21

REAL LIVE DATA FOR CS COURSES

CONCLUSION

▸ Facilitate incorporation of online data sources into

programming assignments

▸ Painlessly ▸ Seamlessly

21

slide-22
SLIDE 22

cs.berry.edu/sinbad

Use a data set in your next assignment!

slide-23
SLIDE 23
slide-24
SLIDE 24

REAL LIVE DATA FOR CS COURSES

DATA SOURCE SPECIFICATION FILE

  • Data source URL and format.
  • Human-friendly name and description, along

with URL to a project or informational page about the data source.

  • A specification of pre-supplied and user-

supplied (required and optional) query parameters or path parameters. The latter are user-provided strings that are substituted in for placeholders in the URL path.

  • Programmatic options specific to the

particular data source object (such as a header for CSV files).

  • Cache settings, such as cache directory path
  • r timeout.
  • A data schema defining the exposed data

structures and fields from the source with various helpful annotations such as textual descriptions of fields that can be displayed by printUsageString().

{ "name": "Geographical Data - Peru", "format": "TSV", "path": "http://download.geonames.org/export/dump/PE.zip", "infourl": "http://www.geonames.org/", "options": [ { "name": "fileentry", "value": "PE.txt" }, { "name": "header", "value": "geoid,name,asciiname,altnames,lat,long,feature-class,feature- code,cc,cc2,admin1,admin2,admin3,admin4,pop,elev,dem,tz,mod" }], }

DataSource.connectUsing("geospec-pe.spec");

slide-25
SLIDE 25

TEACHING CS COURSES WITH REAL LIVE DATA

FUTURE

▸ GUI tools ▸ Additional data formats ▸ HTML tables, web scrapers (regexps) ▸ Customized for popular APIs (ebay, twitter, etc.) ▸ Streaming, pagination, sampling… ▸ More curriculum resources (CS1/2) ▸ Evaluation of effectiveness

slide-26
SLIDE 26
slide-27
SLIDE 27

BART, ET AL. FIGURE 2

import java . u t i l . List ; import java . u t i l . HashSet ; import realtimeweb . earthquakeservice . main . EarthquakeService ; import realtimeweb . earthquakeservice . domain . Earthquake ; public class EarthquakeDemo { public static void main ( String [ ] args ) throws EarthquakeException { // Use the EarthquakeService l i b r a r y EarthquakeService es = EarthquakeService . getInstance ( ) ; es . connect ( ) ; // Remove to use the l o c a l cache // 5 minute delay , but i f we use the cache no delay i s needed ! int DELAY = 5 ∗ 60 ∗ 1000; HashSet<Earthquake> seenQuakes = new HashSet<Earthquake >(); // Poll s e r v i c e r e g u l a r l y while ( true ) { // Get a l l earthquakes in the past hour List <Earthquake> l a t e s t = es . getEarthquakes ( History .ALL) ; // Check i f t h i s i s a new earthquake for ( Earthquake e : l a t e s t ) { i f ( ! seenQuakes . contains ( e )) { // Report new earthquakes System . out . p r i n t l n ( ”New quake ! ” ) ; seenQuakes . add ( e ) ; } } // Delay to avoid spamming the weather s e r v i c e Thread . s l e e p (DELAY) ; } } }

slide-28
SLIDE 28

EQUIVALENT

import core.data.*; import java.util.Date; import java.util.HashSet; import java.util.List; public class EarthquakeDemo { public static void main(String[] args) { int DELAY = 5; // 5 minute cache delay DataSource ds = DataSource.connectAs("JSON", "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.geojson"); ds.setCacheTimeout(DELAY); ds.load(); ds.printUsageString(); HashSet<Earthquake> quakes = new HashSet<Earthquake>(); while (true) { ds.load(); // this only actually reloads data when the cache times out List<Earthquake> latest = ds.fetchList("Earthquake", "features/properties/title", "features/properties/time", "features/properties/mag", "features/properties/url"); for (Earthquake e : latest) { if (!quakes.contains(e)) { System.out.println("New quake!... " + e.description + " (" + e.date() + ") info at: " + e.url); quakes.add(e); } } } } }

slide-29
SLIDE 29

PLUS…

// this class may be instructor-provided, or left to students to define as an exercise class Earthquake { String description; long timestamp; float magnitude; String url; public Earthquake(String description, long timestamp, float magnitude, String url) { this.description = description; this.timestamp = timestamp; this.magnitude = magnitude; this.url = url; } public Date date() { return new Date(timestamp); } public boolean equals(Object o) { // introductory CS students would probably implement a simpler version of this if (o.getClass() != this.getClass()) return false; Earthquake that = (Earthquake) o; return that.description.equals(this.description) && that.timestamp == this.timestamp && that.magnitude == this.magnitude; } public int hashCode() { // technically, hashCode() should be overridden if equals() is return (int) (31 * (31 * this.description.hashCode() + this.timestamp) + this.magnitude); } }

slide-30
SLIDE 30

OUTPUT (1)

The following data is available: a structure with fields: { type : * metadata : a structure with fields: { api : * count : * generated : * status : * title : * url : * } features : A list of: structures with fields: { id : * type : * geometry : a structure with fields: { type : * coordinates : A list of: * } properties : a structure with fields: { alert : * cdi : * code : * ... type : * types : * tz : * updated : * url : * } } }

slide-31
SLIDE 31

OUTPUT (2)

New quake!... M 0.8 - 3km WSW of Tahoe Vista, California (Wed Jul 06 21:50:36 EDT 2016) info at: http://earthquake.usgs.gov/earthquakes/eventpage/nn00550830 New quake!... M 1.2 - 9km WNW of Cobb, California (Wed Jul 06 21:44:22 EDT 2016) info at: http:// earthquake.usgs.gov/earthquakes/eventpage/nc72659116 New quake!... M 1.2 - 47km E of Cape Yakataga, Alaska (Wed Jul 06 21:37:45 EDT 2016) info at: http://earthquake.usgs.gov/earthquakes/eventpage/ak13742485 New quake!... M 1.5 - 3km WSW of Tahoe Vista, California (Wed Jul 06 21:30:39 EDT 2016) info at: http://earthquake.usgs.gov/earthquakes/eventpage/nc72659111 New quake!... M 2.0 - 90km N of Redoubt Volcano, Alaska (Wed Jul 06 21:16:04 EDT 2016) info at: http://earthquake.usgs.gov/earthquakes/eventpage/ak13742480