data handling import cleaning and visualisation
play

Data Handling: Import, Cleaning and Visualisation Lecture 7: Data - PowerPoint PPT Presentation

9/12/2019 Data Handling: Import, Cleaning and Visualisation Data Handling: Import, Cleaning and Visualisation Lecture 7: Data Sources, Data Gathering, Data Import Prof. Dr. Ulrich Matter 24/10/2019 file:///home/umatter/Dropbox/T


  1. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Data Handling: Import, Cleaning and Visualisation Lecture 7: Data Sources, Data Gathering, Data Import Prof. Dr. Ulrich Matter 24/10/2019 file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 1/54

  2. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Recap: Programming with Data file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 2/54

  3. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Loops · Repeatedly execute a sequence of commands. · Known or unknown number of iterations. · Types: ‘for-loop’ and ‘while-loop’. - ‘for-loop’: number of iterations typically known. - ’while-loop: number of iterations typically not known. file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 3/54

  4. 9/12/2019 Data Handling: Import, Cleaning and Visualisation for-loop file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 4/54

  5. 9/12/2019 Data Handling: Import, Cleaning and Visualisation while-loop file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 5/54

  6. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Booleans and logical statements 2+2 == 4 ## [1] TRUE 3+3 == 7 ## [1] FALSE 4!=7 ## [1] TRUE file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 6/54

  7. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Booleans and logical statements condition <- TRUE if (condition) { print("This is true!") } else { print("This is false!") } ## [1] "This is true!" file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 7/54

  8. 9/12/2019 Data Handling: Import, Cleaning and Visualisation R functions · f : X → Y · ‘Take a variable/parameter value as input and provide value as X Y output’ · For example, . 2 × X = Y · R functions take ‘parameter values’ as input, process those values according to a predefined program, and ‘return’ the results. file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 8/54

  9. 9/12/2019 Data Handling: Import, Cleaning and Visualisation R functions # define our own function to compute the mean, given a numeric vector my_mean <- function(x) { x_bar <- sum(x) / length(x) return(x_bar) } file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 9/54

  10. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Today: Putting it All Together file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 10/54

  11. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Putting it all together · You know what ‘data’ is … · You know how digital data is stored … · You know how to write computer code … · You know the basics of programming in R … These are the basics to handel data properly! This is the fundament of data science! file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 11/54

  12. 9/12/2019 Data Handling: Import, Cleaning and Visualisation We are ready to start the data science journey The first key bottleneck in the data pipeline: Gather and import the data! file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 12/54

  13. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Sources/formats in economics file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 13/54

  14. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Sources/formats in economics · CSV (typical for rectangular/table-like data) · Variants of CSV (tab-delimited, fix length etc.) · XML and JSON (useful for complex/high-dimensional data sets) · HTML (a markup language to define the structure and layout of webpages) · Unstructured text file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 14/54

  15. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Sources/formats in economics · Excel spreadsheets ( .xls ) · Formats specific to statistical software packages (SPSS: .sav , STATA: .dat , etc.) · Built-in R datasets · Binary formats file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 15/54

  16. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Data Gathering Procedure file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 16/54

  17. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Organize your data pipeline! · One R script to gather/import data. · The beginning of your data pipeline! file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 17/54

  18. 9/12/2019 Data Handling: Import, Cleaning and Visualisation A Template/Blueprint Tell your future self what this script is all about ####################################################################### # Data Handling Course: Example Script for Data Gathering and Import # # Imports data from ... # Input: links to data sources (data comes in ... format) # Output: cleaned data as CSV # # U. Matter, St. Gallen, 2018 ####################################################################### file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 18/54

  19. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Script sections · Recall: programming tasks can often be split into smaller tasks. · Use sections to implement task-by-task and keep order. · In RStudio: Use ---------- to indicate the beginning of sections. · Start with a ‘meta’-section. file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 19/54

  20. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Script sections ####################################################################### # Data Handling Course: Example Script for Data Gathering and Import # # Imports data from ... # Input: links to data sources (data comes in ... format) # Output: cleaned data as CSV # # U. Matter, St. Gallen, 2018 ####################################################################### # SET UP -------------- # load packages library(tidyverse) # set fix variables INPUT_PATH <- "/rawdata" OUTPUT_FILE <- "/final_data/datafile.csv" file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 20/54

  21. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Script sections Finally we add sections with the actual code (in the case of a data import script, maybe one section per data source) ####################################################################### # Project XY: Data Gathering and Import # # This script is the first part of the data pipeline of project XY. # It imports data from ... # Input: links to data sources (data comes in ... format) # Output: cleaned data as CSV # # U. Matter, St. Gallen, 2018 ####################################################################### # SET UP -------------- # load packages library(tidyverse) # set fix variables INPUT_PATH <- "/rawdata" OUTPUT_FILE <- "/final_data/datafile.csv" file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 21/54

  22. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Loading/Importing Rectangular Data file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 22/54

  23. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Loading built-in datasets In order to load such datasets, simply use the data() -function: data(swiss) file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 23/54

Recommend


More recommend