public data
play

Public Data Enhancing Data Discovery and Exploration Benjamin Yolken - PowerPoint PPT Presentation

Public Data Enhancing Data Discovery and Exploration Benjamin Yolken (yolken@google.com) September 2011 Overview Disseminating public statistics Google tools Public Data Explorer Fusion Tables Refine Conclusion Disseminating


  1. Public Data Enhancing Data Discovery and Exploration Benjamin Yolken (yolken@google.com) September 2011

  2. Overview Disseminating public statistics Google tools ● Public Data Explorer ● Fusion Tables ● Refine Conclusion

  3. Disseminating Statistics

  4. Objective Make public statistics accessible, useful, and well-organized.

  5. Public statistics (2)

  6. Public statistics (4)

  7. Accessible... (1) Access: Data need to be online and findable ● Provider web sites ● Third-party aggregators ● Search engines (2) Understanding: Statisticians aren't the only users ● Lay users: Teachers, students, journalists, policy makers ● Computers: Search engines ● If not accessible to non-experts, data can become unused or, worse, misused

  8. Useful... There are a lot of distractions today: tables and simple plots are not enough Need to engage not just with users' eyes, but also their brains

  9. Well-organized... Go beyond flat lists of data... ● Topics ● Time periods ● Geographic regions ● Formats ● Languages, etc... Ultimately, depends on having good metadata

  10. Google Tools

  11. Public Data Explorer (PDE) [Link] What it is: ● Stand-alone product for interactively exploring and visualizing rich datasets ● Visualizations can be shared or embedded on 3rd party sites What it's good for: ● Reaching out to non-expert users ● Getting traffic to your site ● Categorical, aggregated, time-series data Caveats: ● Datasets must be in Dataset Publishing Language (DSPL) format ○ Have some tools to help ○ Working on converters from other formats like SDMX

  12. PDE: Demo Demo link

  13. PDE: Embed Demo link

  14. Fusion Tables [Link] What it is: ● Product for creating, editing, and sharing tabular data What it's good for: ● Table edits and transformations: joining, filtering, aggregating, etc. ● Static visualizations, particularly maps ● Exposing data to users via APIs Caveats: ● Not connected to PDE (yet) ● Not as useful for time series exploration

  15. Fusion Tables: Demo Demo link

  16. Google Refine What it is: ฀ ● Desktop-based tool for cleaning up and transforming tabular data What it's good for: ฀ ● Bulk data transformations ● Faceted data browsing ● Outlier-detection and cleanup Caveats: ฀ ● No collaboration features (yet)

  17. Google Refine

  18. Conclusion Need to make statistics accessible , useful , organized Google has tools that can help Key advice: Think about the users, their needs Really exciting area, only scratched the surface in terms of what's possible

  19. Thank you! Questions?

  20. Appendix

  21. PDE Intro Video

  22. PDE: Metadata Dataset Publishing Language (DSPL) ● ฀Designed for interactive exploration and visualization ● Released under BSD, open source license ● Combines data tables (CSV) with metadata (XML)

  23. PDE: Dataset Creation and Upload

Recommend


More recommend