Spice up your website with Machine Learning! Evelina Gabasova @evelgab
F# Snippets
F# Snippets fssnip.net
Searching through F# snippets over 1600 snippets over 1100 different tags
Searching through F# snippets
Do we need a custom system?
Great opportunity to create a custom machine learning system!
Nguyen A et al.: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. 2015.
Using machine learning in production dependence on training data inputs
User-generated inputs
PART I Finding related snippets If you liked this F# code, you'll also like ...
Simple information retrieval common terms
Bag of words ignore order of words separate text and code
Term frequency Snippet 1 Snippet 2 Term Frequency Term Frequency async 3 async 0 x 15 x 15 The 2 The 2 code 1 code 1 ... ...
Inverse document frequency Relative importance of terms number of snippets idf (term) = log number of snippets with term
Vector representation: TF-IDF Term frequency - inverse document frequency tfidf (term, snippet) = tf (term, snippet) × idf (term)
Demo
Vector representation of snippets Snippet x List Array ... snippet1 0 0.17 0 ... snippet2 0 0.04 0.001 ... snippet3 0.23 0.005 0.31 ... snippet4 0 0 0 ... ...
Vector representation of snippets
PART II Suggesting tags
Suggesting tags
Making sense of user-generated tags async, #async, async mailprocessor, async paraller, Async sequences, asyncseq, asynchronous, Asynchronous Processing, Asynchronous Programming, asynchronous sequence, asynchronous workflows
Edit distance regex vs. regexp sports vs. ports pi vs. API
Machine learning From snippets to tags
Associations string and parser async and MailboxProcessor sequence and exception
Naive Bayes Why do you call me naive?
Why naive? string and parser async and MailboxProcessor sequence and exception
Building a predictor
Building a predictor
Building a predictor
Tag probabilities Bayes theorem p ( A ∣ B ) = p ( B ∣ A ) p ( A ) p ( B )
Tag probabilities Bayes theorem p (tag ∣ snippet) ∝ p (tag) p (snippet ∣ tag)
Tag probabilities Bayes theorem p (tag ∣ snippet) ∝ p (tag) p (term ∣ tag) ∏ term
1. Prior probabilities p (tag) ≈ Number of snippets with the tag Number of snippets
2. Tag likelihood How frequent is the term among snippets that have the tag ? p (term ∣ tag) = Number of snippets with the term and tag Number of snippets with the tag
Naive Bayes prediction p (tag ∣ snippet) ∝ p (tag) p (term ∣ tag) ∏ term p (tag ∣ snippet) > 1? p (¬tag ∣ snippet)
The theory is always nicer What if there is no snippet tagged async that contains List?
Demo
Do you really need a custom system? Domain representation What are important features Machine learning is fun!
Learning more F# snippets fssnip.net F# snippets on GitHub github.com/fssnippets The F# Foundation www.fsharp.org FsLab Package www.fslab.org Introduction to information retrieval informationretrieval.org
Workshop Polyglot Data Science: The Force Awakens Friday, April 1 Data science, F#, R, D3.js ... and Star Wars!
Thank you! @evelgab github.com/evelinag evelinag.com
Recommend
More recommend