teaching old type systems teaching old type systems new
play

Teaching old type systems Teaching old type systems new tricks with - PowerPoint PPT Presentation

Teaching old type systems Teaching old type systems new tricks with type providers new tricks with type providers Tomas Petricek Tomas Petricek University of Kent and The Alan Turing Institute http://tomasp.net tomas@tomasp.net @tomaspetricek


  1. Teaching old type systems Teaching old type systems new tricks with type providers new tricks with type providers Tomas Petricek Tomas Petricek University of Kent and The Alan Turing Institute http://tomasp.net tomas@tomasp.net @tomaspetricek | |

  2. DATA SCIENCE DATA SCIENCE

  3. DEMO DEMO Open, reproducible data visualizations Open, reproducible data visualizations

  4. Tooling for data science Tooling for data science The gap between spreadsheets and programming

  5. Tooling for data science Tooling for data science Making programming languages a bit easier

  6. Tooling for data science Tooling for data science Learning from spreadsheet interaction model

  7. Reading data Reading data Unsafe dynamic access in a typed language var url = "http://dvd.netflix.com/Top100RSS"; var rss = XDocument.Load(topRssFeed); var channel = rss.Element("rss").Element("channel"); foreach(var item in channel.Elements("item")) { Console.WriteLine(item.Element("text").Value); } Not found!

  8. Reading data Reading data Unsafe dynamic access in a typed language var url = "http://dvd.netflix.com/Top100RSS"; var rss = XDocument.Load(topRssFeed); var channel = rss.Element("rss").Element("channel"); foreach(var item in channel.Elements("item")) { Console.WriteLine(item.Element("title").Value); }

  9. Reading data Reading data Accessing data from external data sources  Languages do not understand data  There is rarely explicit schema  Manually de�ne types to caputre it  Easier in dynamic languages

  10. Aggregating data Aggregating data Athletes by number of gold medals from Rio 2016 Unknown file olympics = pd.read_csv("olympics.csv") olympics[olympics["Games"] == "Rio (2016)"] .groupby("Athlete") .agg({"Gold": sum}) .sort_values(by="Gold", ascending=False) .head(8) Column name

  11. Aggregating data Aggregating data Language and data source features you need to know  Python dictionaries {"key": value}  Generalised indexers .[ condition ]  Operation names sort_values  Data column names "Athlete"

  12. TYPE PROVIDERS TYPE PROVIDERS

  13. ∅ ⊢ e : τ

  14. π ( ) ⊢ e : τ

  15. DEMO DEMO Reading data from an RSS feed Reading data from an RSS feed

  16. F# Data library F# Data library Type providers for structured data  Structural shape inference  Language integration via type providers  Relative type safety

  17. {title : string, author : {age : int}} {author : {age : float}} { title : option<string>, author : {age : float} }

  18. { coordinates : {lng:num, lat:num} } string { coordinates : {lng:num, lat:num} } + string

  19. Shape inference Shape inference Pragmatic design choices for usability  Prefers records for tooling  Predictable and stable  Open world assumption about sums

  20. DEMO DEMO Aggregating Olympic medalists Aggregating Olympic medalists

  21. Dot-driven development Dot-driven development Encoding complex logic via simple member access  Type providers for member generation  Laziness for scaling to large hierarchies  Fancy types for the masses!

  22. Row types and phantom types Row types and phantom types Row types to track names and types of �elds Γ ⊢ e : [ f 1 : τ 1 , … , f n : τ n ] Γ ⊢ e . drop f i : [ f 1 : τ 1 , … , f i −1 : τ i −1 f i +1 , : τ i +1 , … , f n : τ n ] Embed row types in provided nominal types Γ ⊢ e : C 1 where Γ ⊢ e . drop f i : C 2 fields ( C 1 ) = { f 1 : τ 1 , … , f n : τ n } fields ( C 2 ) = { f 1 : τ 1 , … , f i −1 : τ i −1 f i +1 , : τ i +1 , … , f n : τ n }

  23. Fancy types for the masses! Fancy types for the masses! Powerful idea that works in other contexts  Row types and phantom types  Session types for communication  Add your own fancy type here!

  24. BEHIND THE SCENES BEHIND THE SCENES

  25. Relative type safety Relative type safety Well typed programs do not go wrong. (As long as the world is well-behaved.)

  26. F# Data and safety F# Data and safety Given representative samples and an input value S ( d ) ⊏ S ( d 1 , … , d n ) Any program written using a type provider reduces ⇝ ∗ e user [ x ← new C ( d )] v

  27. DEMO DEMO Handling schema change and errors Handling schema change and errors

  28. F# Data and schema change F# Data and schema change Provided type can change only in limited ways C [ e ] → C [ e . M ] C [ e ] → C [ match e with …] C [ e ] → C [ int ( e )]

  29. Structure of a type provider Structure of a type provider Context maps names to de�nitions and nested contexts L ¯ L ′ ¯ ¯ ¯ ¯ L ( C ) = type C ( x : τ ) = m , Pivot provider takes schema and provides a class with context pivot( F ) = C , L

  30. DEMO DEMO Fancy types in action Fancy types in action

  31. Pivot type provider Pivot type provider Generate classes that drop individual columns

  32. JSON type provider JSON type provider Generate class corresponding to a record shape

  33. SUMMARY SUMMARY

  34. Future work Future work Making programming with data easier  Learning from spreadsheets  Understanding programmer interactions  Handling joins and data cleaning  Read, analyse and visualize!

  35. DEMO DEMO Learning from spreadhseets Learning from spreadhseets

  36. Thank you! Thank you! Teaching old type systems new tricks with type providers Teaching old type systems new tricks with type providers Dot-driven Towards minimal calculus of interactions Fancy types Encoding row types via type providers Relative safety Necessity when working with data Tomas Petricek  tomas@tomasp.net @tomaspetricek tomasp.net/academic | |  thegamma.net fslab.org gamma.turing.ac.uk | |

  37. References References Don Syme, Keith Battocchi, Kenji Takeda, Donna Malayeri and Tomas Petricek. Themes in Information-Rich Functional Programming for Internet-Scale Data Sources . In proceedings of DDFP 2013 Tomas Petricek, Gustavo Guerra and Don Syme. Types from data: Making structured data �rst-class citizens in F# . PLDI 2016 Tomas Petricek. Data exploration through dot-driven development . In proceedings of ECOOP 2017

Recommend


More recommend