transducing for fun and profit
play

Transducing for fun and profit simon@metabase.com @sbelak Clojure - PowerPoint PPT Presentation

Transducing for fun and profit simon@metabase.com @sbelak Clojure at a glance (lisp (running-on :JVM)) Functional, dynamic, immutable Excellent concurrency and state-management primitives Unparalleled data manipulation


  1. Transducing for fun and profit simon@metabase.com @sbelak

  2. Clojure at a glance • (lisp (running-on :JVM)) • Functional, dynamic, immutable • Excellent concurrency and state-management primitives • Unparalleled data manipulation

  3. 
 
 
 
 
 Anatomy of a transducer • Transducers decomplect recursion mechanism, transformation, building the output, and access mechanism 
 • 3 user-facing “protocols”: transducer, reducing fn, CollReduce

  4. transducer and reducing function

  5. transducer and reducing function Using a transducer to wrap/keep state

  6. Wrap Java

  7. CollReduce protocol • Get the next element • Makes transducing data structure-agnostic allowing us to (re)use transducers for things such as clojure.async channels

  8. Transducing an async channel

  9. 
 
 
 
 
 
 Composing transducers 1. comp transducers 
 Data structure that can be 2. Reducing function and transducer manipulated like any other 3. github.com/henrygarner/redux 
 post-complete fuse 


  10. On-line/streaming analysis

  11. Metabase ❤ 
 github.com/metabase/metabase • Open source analytics tool (runs on-premises) • Building a “data scientist in a box” • Hundreds to billions of rows • Some DBs optimised for analytics, some not

  12. Many batch algorithms can be turned into online ones Parallelize independent computations Find a recursive relation

  13. github.com/MastodonC/kixi.stats • Count • Kurtosis • (Arithmetic) mean • Covariance • Geometric mean • Covariance matrix • Harmonic mean • Correlation • Median • Correlation matrix • Variance • Simple linear regression • Interquartile range • Standard error of the mean • Standard deviation • Standard error of the estimate • Standard error • Standard error of the prediction • Skewness • …

  14. Single-pass analysis

  15. Data = code

  16. Using transducers is worth it for the composition alone

  17. Annoyances • Can only transduce one coll at a time • Always have to pass in an xf (especially annoying when using redux) • Having functions that return a transducer or not is error prone • Inconsistent support for transducers in core library

  18. Questions simon@metabase.com @sbelak

Recommend


More recommend