data viz
play

Data Viz April 2, 2020 Data Science CSCI 1951A Brown University - PowerPoint PPT Presentation

Data Viz April 2, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Announcements Videos on if you can! Use raise-hand feature for questions. Any questions/concerns


  1. Data Viz April 2, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1

  2. Announcements • Videos on if you can! Use raise-hand feature for questions. • Any questions/concerns logistically? • Extra Office Hours tomorrow 2

  3. Today • Questions from previous lectures? (Dimensionality Reduction, Classification, Regularization) • Data Viz tips and best practices 3

  4. When do I do data viz during a project? 4

  5. When do I do data viz during a project? Hypothesis: CS students sleep less than Brown students in general 5

  6. When do I do data viz during a project? Viz #1: Quick side-by-side histogram of CS students’ sleep vs. the rest. Means + CIs Hypothesis: CS students sleep less than Brown students in general 6

  7. When do I do data viz during a project? Viz #1: Quick side-by-side histogram of CS students’ sleep vs. the rest. Means + CIs Run linear regression, control for various things, find large coefficient on whether student has two concentrations Hypothesis: CS students sleep less than Brown students in general 7

  8. When do I do data viz during a project? Viz #2: Quick histograms (or box-whiskers maybe) of Viz #1: Quick hours of sleep vs. number of side-by-side concentrations histogram of CS students’ sleep vs. the rest. Means + CIs Run linear regression, control for various things, find large coefficient on whether student has two concentrations Hypothesis: CS students sleep less than Brown students in general 8

  9. When do I do data viz during a project? Viz #2: Quick histograms (or box-whiskers maybe) of Viz #1: Quick hours of sleep vs. number of side-by-side concentrations histogram of CS students’ sleep Viz #3: Quick vs. the rest. histogram of number Means + CIs of concentrations for CS vs. non-CS Run linear regression, control students for various things, find large coefficient on whether student has two concentrations Hypothesis: CS students sleep less than Brown students in general 9

  10. When do I do data viz during a project? Viz #2: Quick histograms (or box-whiskers maybe) of Viz #1: Quick hours of sleep vs. number of side-by-side concentrations histogram of CS Viz #4: Final students’ sleep Viz #3: Quick polished vs. the rest. histogram of number visualizations for Means + CIs of concentrations for poster/paper/ CS vs. non-CS Run linear regression, control report students for various things, find large coefficient on whether student has two concentrations Hypothesis: CS students sleep less than Brown students in general 10

  11. When do I do data viz while not during a project? converged Viz #ia: Quick histograms (or box-whiskers maybe) of Viz #1: Quick hours of sleep vs. number of side-by-side concentrations histogram of CS Viz #N+1: Final students’ sleep Viz #ib: Quick polished vs. the rest. histogram of number visualizations for Means + CIs of concentrations for poster/paper/ CS vs. non-CS Run linear regression, control report students for various things, find large coefficient on whether student has two concentrations Hypothesis: CS students sleep less than Brown students in general 11

  12. When do I do data viz during a project? • At the very start of analysis, to find out wth is going on in my data • Periodically throughout, to vet the quantitative trends I am seeing • At the very end of a project, to showcase the results 12

  13. When do I do data viz during a project? • At the very start of analysis, to find out wth is going on in my data • Periodically throughout, to vet the quantitative trends I am seeing • At the very end of a project, to showcase the results More important (matplotlib, excel, whatever is easy) 13

  14. When do I do data viz during a project? Most attention, cause its fun ;) (D3, etc.) • At the very start of analysis, to find out wth is going on in my data • Periodically throughout, to vet the quantitative trends I am seeing • At the very end of a project, to showcase the results 14

  15. When do I do data viz during a project? • At the very start of analysis, to find out wth is going on in my data • Periodically throughout, to vet the quantitative trends I am seeing • At the very end of a project, to showcase the results You are the main audience, goal is to make sure you understand what you are looking at 15

  16. When do I do data viz during a project? Everyone else is the main audience. Goal is to make point as clearly and concisely as possible. • At the very start of analysis, to find out wth is going on in my data • Periodically throughout, to vet the quantitative trends I am seeing • At the very end of a project, to showcase the results 16

  17. So many bad figures… Diane Maggie Neil 17

  18. My “three pillars”* of Data Viz *:) 18

  19. My “three pillars” of Data Viz — Your figures should speak for themselves. The analysis should be understandable and your conclusions should be obviously supported, without too much effort 19

  20. My “three pillars” of Data Viz — Your figures should speak for themselves. The analysis should be understandable and your conclusions should be obviously supported, without too much effort Don’t obfuscate the data or H ide the pr O cess you used to come to your co N clusions. Giv E people enough data S o that T hey can disagree with Y ou if they want to. 20

  21. My “three pillars” of Data Viz — Your figures should speak for themselves. The analysis should be understandable and your conclusions should be obviously supported, without too much effort Don’t obfuscate the data or H ide the pr O cess you used to come to your co N clusions. Giv E people enough data S o that T hey can disagree with Y ou if they want to. Minimalism — Substance over style. Make your point concisely, without redundant or distracting information or ornamentation. 21

  22. Ellie rants about culture for 2 seconds. Indulge me…. “form follows function” 22

  23. Great tangent to go on… Edward Tufte—dogma of data viz 23

  24. My “three pillars” of Data Viz — Your figures should speak for themselves. The analysis should be understandable and your conclusions should be obviously supported, without too much effort 24

  25. Missing or Cryptic Labels Learning curve 100 75 50 25 0 25

  26. Missing or Cryptic Labels Learning curve 100 Classification Accuracy (%) 75 50 25 0 10 100 1000 10000 1000000 Training Size 26

  27. Skewed or Crunched Data Population 1 Population 2 10000 7500 Frequency 5000 2500 0 20 40 60 80 100 Age 27

  28. Skewed or Crunched Data Population 1 Population 2 10 Sometimes can use logs 7.5 (but say you did so…) Log Frequency 5 2.5 0 20 40 60 80 100 Age 28

  29. Skewed or Crunched Data 90 67.5 Frequency 45 22.5 0 20 40 60 80 100 Age 29

  30. Skewed or Crunched Data 40 Sometimes can remove outliers 30 (but say you did so…) Frequency 20 10 0 4 8 12 16 20 Age 30

  31. Skewed or Crunched Data 40 30 20 90 40 10 30 0 67.5 4 8 12 16 20 20 Frequency 10 45 0 90 92 94 96 100 22.5 Sometimes better to 0 analyze separately. 20 40 60 80 100 Age (Look at your data!) 31

  32. Skewed or Crunched Data 100 75 50 25 0 20 40 60 80 100 32

  33. Skewed or Crunched Data 100 100 100 100 100 75 75 75 75 75 50 50 50 50 50 25 25 25 25 25 0 0 0 0 0 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 Sometimes better to split into multiple charts… 33

  34. Chart/Data Type Mismatch Company Earnings by Year (in 2012 millions) 2013 2014 2015 2016 1.3 2017 2.3 1.7 2.1 2.1 2.0 34

  35. Chart/Data Type Mismatch Company Earnings by Year (in 2012 millions) Not really 2013 2014 interpretable as 2015 2016 1.3 “parts of a 2017 2.3 whole”… 1.7 2.1 2.1 2.0 35

  36. Chart/Data Type Mismatch Company Earnings by Year (in millions) 2.3 2.0 1.8 1.5 1.3 2012 2013 2014 2015 2016 2017 36

  37. Chart/Data Type Mismatch Earnings Gap in Canada is Smaller 70 52.5 Earnings 35 17.5 0 US Canada College No College 37

  38. Chart/Data Type Mismatch Earnings Gap in Canada is Smaller 16 12 Earnings Gap 8 4 0 US Canada 38

  39. Clicker Question! States I have lived in 18 13.5 Years 9 4.5 0 Michigan Maryland Pennsylvania New York Rhode Island What is the biggest problem with this? (a) Crunched/Skewed Data (b) Missing/Cryptic Labels (c) Chart/Data Type Mismatch (d) Its just ugly 39

  40. Clicker Question! States I have lived in 18 13.5 Years 9 4.5 0 Michigan Maryland Pennsylvania New York Rhode Island What is the biggest problem with this? (a) Crunched/Skewed Data (b) Missing/Cryptic Labels (c) Chart/Data Type Mismatch (d) Its just ugly 40

  41. Clicker Question! States I have lived in 18 13.5 Years 9 4.5 0 Michigan Maryland Pennsylvania New York Rhode Island What is the biggest problem with this? (a) Crunched/Skewed Data (b) Missing/Cryptic Labels (c) Chart/Data Type Mismatch (d) Its just ugly 41

  42. My “three pillars” of Data Viz Don’t obfuscate the data or H ide the pr O cess you used to come to your co N clusions. Giv E people enough data S o that T hey can disagree with Y ou if they want to. 42

Recommend


More recommend