Analyzing Pwned Passwords with Spark Kelley Robinson @kelleyrobinson Developer Evangelist
+
@KELLEYROBINSON BIG DATA & SECURITY Spark: then and now The state of passwords Spark in action Big Data ∩ Security
BIG DATA & SECURITY @KELLEYROBINSON
BIG DATA & SECURITY @KELLEYROBINSON Apache Spark Ecosystem
@KELLEYROBINSON BIG DATA & SECURITY Spark Abstractions Then RDD (Resilient Distributed Dataset) DataFrames / Datasets Now
@KELLEYROBINSON BIG DATA & SECURITY RDDs • Immutable & distributed collection • Unstructured data • Low-level transformation and control https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html
BIG DATA & SECURITY @KELLEYROBINSON https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
@KELLEYROBINSON BIG DATA & SECURITY Datasets • Structured data • Strongly typed • Fast https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html
@KELLEYROBINSON BIG DATA & SECURITY Datasets • Structured data • Strongly typed • Fast • SQL DSLs
BIG DATA & SECURITY @KELLEYROBINSON Apache Spark Ecosystem
BIG DATA & SECURITY @KELLEYROBINSON Scala has the most robust language API
BIG DATA & SECURITY @KELLEYROBINSON https://www.slideshare.net/databricks/composable-parallel-processing-in-apache-spark-and-weld
BIG DATA & SECURITY @KELLEYROBINSON https://twitter.com/CamJo89/status/996497423621996544
@KELLEYROBINSON BIG DATA & SECURITY Spark: then and now The state of passwords Spark in action Big Data ∩ Security
@KELLEYROBINSON BIG DATA & SECURITY Spark: then and now The state of passwords Spark in action Big Data ∩ Security
https://twitter.com/dog_rates/status/986762231290490881
@KELLEYROBINSON BIG DATA & SECURITY Benefits Fast Flexible Good for exploration Proven for large systems
@KELLEYROBINSON BIG DATA & SECURITY Challenges Opaque error messages Operationalizing Documentation http://heather.miller.am/blog/launching-a-spark-cluster-part-1.html
BIG DATA & SECURITY @KELLEYROBINSON 👎💰 The missing Spark documentation https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/
@KELLEYROBINSON BIG DATA & SECURITY Spark: then and now The state of passwords Spark in action Big Data ∩ Security
BIG DATA & SECURITY @KELLEYROBINSON
@KELLEYROBINSON
BIG DATA & SECURITY
THANK YOU! @kelleyrobinson
BIG DATA & SECURITY @KELLEYROBINSON Spark Resources • Apache Spark • Jacek's Spark Documentation • Zeppelin • RDDs vs. Datasets • Running Spark on a Cluster Security Resources • Pwned Passwords • Reverse SHA1 hashes • LastPass and 1Password • 2FA Guides
Recommend
More recommend