how do we solve the world s spreadsheet problem
play

How Do We Solve the Worlds Spreadsheet Problem? Alex Rasmussen - PowerPoint PPT Presentation

How Do We Solve the Worlds Spreadsheet Problem? Alex Rasmussen @alexras Hi, Im Alex! @alexras alexras.info freenome.com bitsondisk.com My Background 2009-2013 : really fast sorting 2013-2016 : data wrangling 2017-2018 :


  1. How Do We Solve 
 the World’s Spreadsheet Problem? Alex Rasmussen 
 @alexras

  2. Hi, I’m Alex! @alexras alexras.info freenome.com bitsondisk.com

  3. My Background 2009-2013 : really fast sorting 2013-2016 : data wrangling 2017-2018 : cancer- fi ghting robots

  4. I think / worry a lot 
 about spreadsheets.

  5. Today’s focus: spreadsheet data (for compute, felienne.com)

  6. This talk: 1. Spreadsheets are great 2. Spreadsheets are a problem 3. How we can fi x it

  7. This talk: 1. Spreadsheets are great 2. Spreadsheets are a problem 3. How we can fi x it

  8. What’s so great about spreadsheets?

  9. Spreadsheets are 
 Ubiquitous

  10. 1.2 billion O ffi ce users (~16% of humans) 


  11. 1.2 billion O ffi ce users (~16% of humans) 
 60 million O ffi ce 365 customers 


  12. 1.2 billion O ffi ce users (~16% of humans) 
 60 million O ffi ce 365 customers 
 >5 million businesses use Google Apps

  13. Spreadsheets are 
 Approachable

  14. Spreadsheets are 
 Flexible

  15. Data grids

  16. Data grids Graphs

  17. Data grids Graphs Anything tabular

  18. Data grids Graphs Anything tabular Full-scale “apps”

  19. Tatsuo Horiuchi (b. 1940) Kegon Falls , 2007 AutoShape on canvas

  20. https://pasokonga.com/

  21. So if spreadsheets are ubiquitous , approachable , and fl exible , 
 what’s the problem?

  22. This talk: 1. Spreadsheets are great 2. Spreadsheets are a problem 3. How we can fi x it

  23. Problem #1: Data Types

  24. Automatic type conversion can cause serious problems .

  25. DEC1

  26. DEC1 12/1

  27. RIKEN Identi fi er 2310009E13

  28. RIKEN Identi fi er 2310009E13 2.31E+13

  29. “We confirmed gene name errors in 987 supplementary files from 704 published articles ( 19.6% of all articles ).” 
 https://genomebiology.biomedcentral.com/ articles/10.1186/s13059-016-1044-7 –Johnny Appleseed

  30. False Equivalence 000123 
 = 00123 = 123 True if they’re integers, 
 but what if they’re strings ?

  31. Enumerated Types

  32. Enumerated Types “Prostate Cancer”

  33. Enumerated Types “Prostate Cancer” “prostate cancer”

  34. Enumerated Types “Prostate Cancer” “prostate cancer” “prostatecancer”

  35. Enumerated Types “Prostate Cancer” “prostate cancer” “prostatecancer” “PC”

  36. Enumerated Types “Prostate Cancer” “prostate cancer” “prostatecancer” “PC” “prostate”

  37. Enumerated Types “Prostate Cancer” “prostate cancer” “prostatecancer” “PC” “prostate” “prostrate”

  38. List Validations? Sheet Protection? Easy to add Easy to remove by accident Hard to enforce

  39. Data loss! False equivalence! Ontological chaos! Mass hysteria!

  40. Problem #2: Queryability

  41. Inside a spreadsheet, things are pretty good!

  42. Formulas! Pivot Tables! Filters!

  43. What about querying across spreadsheets?

  44. Get and Transform

  45. No Mac support.

  46. Structure changes? Type changes? Column Renames? Have fun re-loading .

  47. And what about joins ?

  48. There’s VLOOKUP =VLOOKUP(“Product 1”, Prices!$A$2:$B$9,2,FALSE) … but, like, eww.

  49. Data inside a spreadsheet is hard to connect to data outside that spreadsheet.

  50. Summary : 
 Spreadsheets are 
 bad at types and 
 hard to query

  51. This talk: 1. Spreadsheets are great 2. Spreadsheets are a problem 3. How we can fi x it

  52. What about databases ?

  53. Databases are great in ways that spreadsheets aren’t .

  54. Databases are great at data type de fi nition and enforcement .

  55. So Many Types of Types! Numeric Enumerated XML Monetary Geometric JSON Character Network Address Arrays Binary Bit String Composite Date/Time Text Search Range Boolean UUID Pseudo-Types

  56. Databases are purpose-built for queries and joins.

  57. BUT

  58. Databases 
 aren’t as approachable 
 as spreadsheets.

  59. $ psql -d postgres psql (10.4, server 9.6.9) Type "help" for help. postgres=#

  60. Databases 
 aren’t as fl exible 
 as spreadsheets.

  61. 
 Databases are good at storing and querying data. 
 But that’s it.

  62. Spreadsheets and databases have complementary skillsets.

  63. So, what do we do about it?

  64. How to Solve Your Spreadsheet Problem

  65. 1. Identify the use case. 2. Stop the spread. 3. Back fi ll.

  66. 1. Identify the use case. 2. Stop the spread. 3. Back fi ll.

  67. Every spreadsheet solves a problem . What is that problem?

  68. What’s the business need? How much data is there? How fast does it change? How frequent are additions?

  69. 1. Identify the use case. 2. Stop the spread. 3. Back fi ll.

  70. Give new data a structured home .

  71. No custom apps. At least at fi rst .

  72. Optimize for Speed

  73. 1. Identify the use case. 2. Stop the spread. 3. Back fi ll.

  74. It’s time for some Data Wrangling . (yee-haw )

  75. Writing one-o ff scripts is sometimes the best option.

  76. https://www.trifacta.com/start-wrangling/

  77. Infer wrangle “recipe” from high-level actions. https://www.trifacta.com/start-wrangling/

  78. Another Option : Programming 
 By Example

  79. FlashRelate A B C D 1 Albania 1,000 1950 FRA 1 A B C D E R . . . 2 Albania 930 1981 FRA 1 1 value year value year Comments 2 Albania 1,000 1950 930 1981 FRA 1 . . . 5 Austria 3,139 1951 FRA 3 3 Austria 3,139 1951 3,177 1955 FRA 3 6 Austria 3,177 1955 FRA 3 4 Belgium 541 1947 601 1950 5 Bulgaria 2,964 1947 3,259 1958 FRA 1 . . . 9 Belgium 541 1947 6 Czech . . . 2,416 1950 2,503 1960 NC 10 Belgium 601 1950 . . . (a) . . . (b) Provide example rows, 
 synthesize layout transformations. https://github.com/microsoft/prose

  80. Foofah Provide input/output sample, synthesize 
 layout and syntactic transformations. https://github.com/umich-dbgroup/foofah

  81. How to Solve Your Spreadsheet Problem 1. Identify the use case. 2. Stop the spread. 3. Back fi ll.

  82. What about 
 the future ?

  83. Spreadsheets aren’t going anywhere, 
 for good reason .

  84. Learn from the spreadsheet.

  85. Meet the users where they are .

  86. Thank you. @alexras Consulting Inquiries : contact@bitsondisk.com

Recommend


More recommend