polyglot data science the force awakens
play

Polyglot data science the force awakens with F#, R and D3.js - PowerPoint PPT Presentation

Polyglot data science the force awakens with F#, R and D3.js Evelina Gabasova @evelgab Tomas Petricek @tomaspetricek Part I F# with type providers fslab.org : Doing data science using F# The data science workflow Data access with type


  1. Polyglot data science the force awakens with F#, R and D3.js Evelina Gabasova @evelgab Tomas Petricek @tomaspetricek

  2. Part I F# with type providers

  3. fslab.org : Doing data science using F# The data science workflow Data access with type providers Interactive analysis with .NET and R libraries Visualization with HTML/PDF charts and reports High-quality open-source libraries

  4. LINQ before it was cool :-) var res = StockData.MSFT .Where(stock => stock.Close ­ stock.Open > 7.0) .Select(stock => stock.Date) Looking under the cover Extension methods take Func<T1, T2> delegates Immutable because it returns a new IEnumerable Functional design allows method chaining

  5. LINQ before it was cool :-) StockData.MSFT |> Array.filter (fun stock ­> stock.Close ­ stock.Open > 7.0) |> Array.map (fun stock ­> stock.Date) Looking under the cover Pipeline operator for composing functions Lambda functions written using fun Immutable lists, sequences, arrays, etc.

  6. Charting libraries for F# XPlot - cross platform, HTML-based (recommended) F# Charting - flexible but Windows-only library Other options: FnuPlot and R provider For latest information See FsLab.org - the F# data science homepage

  7. Charting with XPlot Draw sin for values from to : 0 2 π [| 0.0 .. 0.1 .. 6.3 |] |> Array.map (fun x ­> x, sin x) |> Chart.Line Uses Google Charts behind the scenes: 1.0 0.5 0.0 ­0.5 ­1.0 0.0 1.5 3.0 4.5 6.0

  8. What are type providers?

  9. Type provider patterns Providers for a specific data source let wb = WorldBankData.GetDataContext() wb.Countries.India.Indicators.``Population, total`` Parameterized provider for a data format type Rss = XmlProvider<"data/bbc.xml"> Rss.Load(url).Channel.Description

  10. TASK: Star Wars movie pro�ts Star Wars ­ rating and box office 18 94 2,400,000,000 1,800,000,000 Box office 1,200,000,000 600,000,000 0 1,980 1,990 2,000 2,010 2,020 Year

  11. github.com/evelinag/polyglot-data- science

  12. Part II Visualization with D3.js

  13. The Star Wars social network

  14. D3.js visualizations made easier Gallery of examples

  15. D3.js social network visualization Force-directed network layout

  16. Part III Analyzing social networks with R

  17. Social network analysis Who is the most central character? How to the movies compare between themselves?

  18. The R language "domain-specific" language for statistical analysis

  19. Very quick R intro # assignment x <­ 1 x = 1 # variable and function names x x.y read.csv

  20. Very quick R intro: pipeline |> turns into %>% install.packages("magrittr") library(magrittr) xs <­ c(1,2,3,4,5,6,7,8,9,10) xs %>% mean

  21. Network analysis with igraph igraph website igraph documentation install.packages("igraph") library(igraph)

  22. Creating igraph network library(igraph) g <­ graph(edges) edges = list of nodes n1, n2, n3, n4, n5, ... represents (n1, n2), (n3, n4), ...

  23. Calculating degree d <­ degree(graph)

  24. F# open RProvider.igraph let degree = R.degree(network)

  25. F# export JSON into list of edges R perform the network analysis

  26. Degree

  27. Degree

  28. Degree

  29. Degree Degree( v ) = Number of links v ↔ v ′ v ≠ v ′

  30. Betweenness

  31. Betweenness

  32. Betweenness

  33. Betweenness

  34. Betweenness

  35. Betweenness S v = Number of shortest paths between a and b through v S = Number of shortest paths between a and b S v Betweenness( v ) ab = S

  36. Betweenness S v = Number of shortest paths between a and b through v S = Number of shortest paths between a and b S v Betweenness( v ) = ∑ S ab

  37. Network structure How do the the movies differ? Size Density Clustering coefficient

  38. Density

  39. Density

  40. Density Density = Existing connections Potential connections = Existing connections 1 N ( N − 1) 2

  41. Clustering coef�cient

  42. Clustering coef�cient

  43. Clustering coef�cient

  44. Clustering coef�cient

  45. Clustering coef�cient

  46. Clustering coef�cient

  47. Clustering coef�cient K v = Number of neighbours of v E v = Number of links between neighbours of v E v Clustering( v ) = 1 2 K v K v ( − 1)

  48. Clustering coef�cient K v = Number of neighbours of v E v = Number of links between neighbours of v Clustering(network) = 1 E v N ∑ 1 2 K v K v ( − 1) v

  49. Size Number of characters Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 0 10 20 30 40 Number of characters

  50. Density Network density Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 15 20 25 30 35 Density (%)

  51. Clustering coefficient Clustering coefficient (transitivity) Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 0.40 0.48 0.56 0.64 0.72 Clustering coefficient

  52. CONCLUSIONS

  53. non-profit books and tutorials cross-platform community data science F# Software Foundation commercial support open-source contributions www.fsharp.org machine learning web and cloud consulting user groups research

  54. The Learning Pyramid

  55. Community chat and Q&A #fsharp on Twitter StackOver�ow F# tag Open source on GitHub Visual F# repo github.com/Microsoft/visualfsharp F# Compiler and core libraries github.com/fsharp F# Incubation project space github.com/fsprojects FsLab Organization repository github.com/fslaborg More resources Scott Wlaschin's

  56. Scott Wlaschin's fsharpforfunandprofit.com F# Books and Resources fsharp.org/about/learning.html

  57. The Force Awakens Evelina Gabasova @evelgab evelina@evelinag.com www.evelinag.com Tomas Petricek @tomaspetricek tomas@tomasp.net www.tomasp.net

Recommend


More recommend