Public Data Enhancing Data Discovery and Exploration Benjamin Yolken (yolken@google.com) September 2011
Overview Disseminating public statistics Google tools ● Public Data Explorer ● Fusion Tables ● Refine Conclusion
Disseminating Statistics
Objective Make public statistics accessible, useful, and well-organized.
Public statistics (2)
Public statistics (4)
Accessible... (1) Access: Data need to be online and findable ● Provider web sites ● Third-party aggregators ● Search engines (2) Understanding: Statisticians aren't the only users ● Lay users: Teachers, students, journalists, policy makers ● Computers: Search engines ● If not accessible to non-experts, data can become unused or, worse, misused
Useful... There are a lot of distractions today: tables and simple plots are not enough Need to engage not just with users' eyes, but also their brains
Well-organized... Go beyond flat lists of data... ● Topics ● Time periods ● Geographic regions ● Formats ● Languages, etc... Ultimately, depends on having good metadata
Google Tools
Public Data Explorer (PDE) [Link] What it is: ● Stand-alone product for interactively exploring and visualizing rich datasets ● Visualizations can be shared or embedded on 3rd party sites What it's good for: ● Reaching out to non-expert users ● Getting traffic to your site ● Categorical, aggregated, time-series data Caveats: ● Datasets must be in Dataset Publishing Language (DSPL) format ○ Have some tools to help ○ Working on converters from other formats like SDMX
PDE: Demo Demo link
PDE: Embed Demo link
Fusion Tables [Link] What it is: ● Product for creating, editing, and sharing tabular data What it's good for: ● Table edits and transformations: joining, filtering, aggregating, etc. ● Static visualizations, particularly maps ● Exposing data to users via APIs Caveats: ● Not connected to PDE (yet) ● Not as useful for time series exploration
Fusion Tables: Demo Demo link
Google Refine What it is: ● Desktop-based tool for cleaning up and transforming tabular data What it's good for: ● Bulk data transformations ● Faceted data browsing ● Outlier-detection and cleanup Caveats: ● No collaboration features (yet)
Google Refine
Conclusion Need to make statistics accessible , useful , organized Google has tools that can help Key advice: Think about the users, their needs Really exciting area, only scratched the surface in terms of what's possible
Thank you! Questions?
Appendix
PDE Intro Video
PDE: Metadata Dataset Publishing Language (DSPL) ● Designed for interactive exploration and visualization ● Released under BSD, open source license ● Combines data tables (CSV) with metadata (XML)
PDE: Dataset Creation and Upload
Recommend
More recommend