Mastering Data with Spark and ML Strata London 2019 About Me IIT - PowerPoint PPT Presentation

Sep 20, 2023 •151 likes •485 views

Mastering Data with Spark and ML Strata London 2019 About Me IIT Delhi, 1998 Founder and CEO, Nube Technologies Strata Data San Jose Program Committee Speaker at Spark Summit, Strata, GIDS etc Nube India based startup Deep technical

Mastering Data with Spark and ML Strata London 2019
About Me IIT Delhi, 1998 Founder and CEO, Nube Technologies Strata Data San Jose Program Committee Speaker at Spark Summit, Strata, GIDS etc
Nube India based startup Deep technical problems with an enterprise solution ML, Big Data, UX
This talk today Problem Statement Our Approach
Simple business asks Customer LTV Best supplier for a part Supplier payment terms Householding Cross Sell Opportunities M&A
Actual Data
Actual data Silos Data Quality Volumes
Challenges Variety of sources Scale Capturing rules for matching and merging Working across different business entities
Wishlist Any source and format Any entity type Any volume
Reifier AI powered data management, matching and merging different data sources to build a holistic view. - MDM - Fraud and Analytics - Sales and Marketing - Customer AML/KYC/cross and Upsell - Data Enrichment - Reference data Management - Data Quality
Our stack
Wishlist Any source and format Any entity type Any volume
Any source and format Based on RDDs Custom source and sink formats written by us/borrowed from community
Any source/sink, Any format Elastic: Cassandra:
Problems with RDDs Record wise reading was good, but adding structure to the data was left to us. reifier.Tuple - indexed data structure Development and maintenance nightmare
Reifier 2.0 - Datasets - Pipe abstraction
Building Dataset through Pipe }
Spark Integration Tried Livy etc Additional dependency Finally two ways in which we integrate. One local SparkContext. Second through the SparkLauncher
Wishlist Any source and format Any entity type Any volume
Any entity type -Traditional rule based system fails -AI to the rescue -Also Cassandra
Reifier Interactive Learner
Reifier Interactive Learner
Any scale Add Spark to the mix Ouch, cartesian join - 1million records = Order of a trillion comparisons Learn what to join
AutoML Build multiple models based on the training data Optimize for accuracy and performance Use Spark to train and assess different models
Cassandra Any Entity Any Scale
Cassandra Training Primary Key - Cluster Id, Record Id Secondary Index - r_isMatch
Cassandra Entity Primary Key - Record Id Secondary Index - Cluster Id
Elastic Free flowing search Adhoc analytics Realtime Plugin
Thank You! www.nubetech.co sonal@nubetech.co

Recommend

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark Streaming and Spark SQL Explored Streaming API of Apache Spark on Ukko Cluster Window based Stream Content Direct Stream content

221 views • 9 slides

THE MOD METHOD with VESPERS MASTERING In this Module What mastering can do & what it

THE MOD METHOD with VESPERS MASTERING In this Module What mastering can do & what it cant Self-mastering vs. third-party mastering Picking a mastering engineer Mastering work fl ow Audio artifacts & fi delity:

520 views • 16 slides

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust - @michaelarmbrust What is Apache Spark? Fast and general cluster computing system, interoperable with Hadoop, included in all major distros

667 views • 43 slides

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is SPARK? A sub-language of Ada 83 and 95 with particular properties that make it ideally suited to the most critical of applications: completely

851 views • 10 slides

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark: A Unified Engine for Big Data Processing Engine? Unified? Apache Spark: A Unified Engine for Big Data Processing PAGE 2 Apache Spark: A

503 views • 36 slides

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx Streaming Spark Dataframe Spark Core (RDD) 2 Machine Learning Algorithms Supervised learning Given a set of features and labels Builds a model that

595 views • 24 slides

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

10/05/2019 Big Data : Informatique pour les donnes et calculs massifs 7 SPARK technology Stphane Vialle Stephane.Vialle@centralesupelec.fr http://www.metz.supelec.fr/~vialle Spark Technology 1. Spark main objectives 2. RDD concepts

822 views • 39 slides

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green Snyder, Ph.D. LeeAnne Green Snyder, Ph.D. May 30, 2019 May 30, 2019 Acknowledgements SPARK Families SPARK Team Clinical Sites Libby Brooks,

523 views • 40 slides

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more than a name change. It It reflects enormous change for our customers fl t h f t and our business. Our ambition is to be a winning business,

669 views • 30 slides

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of Meeting: Introductions and formalities Chairmans address Managing Director update Resolutions Shareholder questions Conduct of polls Meeting

425 views • 38 slides

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark Architecture From MX to Spark MX Rich, styleable components Heavy components => Easy to use (most of the time) Spark introduces

502 views • 30 slides

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries included with Spark Spark MLlib Spark SQL GraphX Streaming machine structured graph learning real-time Spark Core Outline Introduction to

683 views • 40 slides

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more

1.5k views • 52 slides

Large-Scale Data Engineering Spark and MLLIB event.cwi.nl/lsde OVERVIEW OF SPARK

Large-Scale Data Engineering Spark and MLLIB event.cwi.nl/lsde OVERVIEW OF SPARK event.cwi.nl/lsde What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Up to 100 faster Improves efficiency

550 views • 36 slides

Spark Overview / High-level Architecture Indexing from Spark Reading data from Solr + term

Spark Overview / High-level Architecture Indexing from Spark Reading data from Solr + term vectors & Spark SQL Document Matching user since 2010, committer since April 2014, work for SolrCloud features and bin/solr! Release manager

569 views • 22 slides

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah (SICS) Spark Streaming and GraphX June 30, 2016 1 / 1 Spark Streaming Amir H. Payberah (SICS) Spark Streaming and GraphX June 30, 2016 2 / 1

526 views • 48 slides

APPLICATION OF SANDWICH STRUCTURES TO AUTOMOTIVE RIMS A. Romeo 1* , D. P. Boso 1 , U. Galvanetto 1

18 TH INTERNATIONAL CONFERENCE ON COMPOSITE MATERIALS APPLICATION OF SANDWICH STRUCTURES TO AUTOMOTIVE RIMS A. Romeo 1* , D. P. Boso 1 , U. Galvanetto 1 1 Dipartimento di Costruzioni e Trasporti, Universit degli Studi di Padova, Padova, Italy *

230 views • 6 slides

Turing Categories and Realizability Chad Nester Joint work with Robin Cockett University of

Turing Categories and Realizability Chad Nester Joint work with Robin Cockett University of Ottawa October 27, 2017 Chad Nester Joint work with Robin Cockett Turing Categories and Realizability Restriction Categories A restriction category

622 views • 23 slides

Starkiller: A Static Type Inferencer and Compiler for Python Michael Salib msalib@alum.mit.edu

Starkiller: A Static Type Inferencer and Compiler for Python Michael Salib msalib@alum.mit.edu Dynamic Languages Group Computer Science & Artificial Intelligance Lab Massachusetts Institute of Technology May 11, 2004 This talk in 60

970 views • 40 slides

2103213 Eng Mech I Chapter 2 Force Systems 2.1

2103213 Eng Mech I Chapter 2 Force Systems 2.1 Statics

1.27k views • 113 slides

Fiberbundle-Based Visualization of a Stir Tank Fluid Benger Werner, Ritter Marcel, Archaya

Fiberbundle-Based Visualization of a Stir Tank Fluid Benger Werner, Ritter Marcel, Archaya Sumanta, Roy Somnath, Jijao Feng WSCG, February 2009, Plzen Talk: Ritter Marcel Outline 1.Data to be Visualized 2.Fiber Bundle Data Model Grid and

506 views • 19 slides

Mauro Boero Institut de Physique et Chimie des Matriaux de Strasbourg University of Strasbourg

Hybrid Quantum Mechanics / Molecular Mechanics (QM/MM) Approaches - Practical aspects: Size of the QM region, initial structure (classical equilibration), QM/MM protocol; computational requirements Mauro Boero Institut de Physique et Chimie des

552 views • 29 slides

HM-215K and HM-215L Update Regulatory Changes Relating to Ammunition for SAAMI Monday, January

HM-215K and HM-215L Update Regulatory Changes Relating to Ammunition for SAAMI Monday, January 1 4 , 2 0 1 3 - 1 - Significant Compliance Dates I ssue Com pliance Date ORM-D (Phase-Out) Air Expiration 12/ 31/ 2012 Other Modes 12/ 31/

206 views • 5 slides

Prepared by: T. Dobbs, J. Netzer, J. Salmons and J. Wiseman J2 Scientific, 1901 Pennsylvania

Prepared by: T. Dobbs, J. Netzer, J. Salmons and J. Wiseman J2 Scientific, 1901 Pennsylvania Drive, Suite C, Columbia, MO 65202 Contact Information: tdobbs@j2scientific.com; 573-214-0472 Why Should We Measure? Organophosphorus compounds are a

234 views • 19 slides

Mastering Data with Spark and ML Strata London 2019 About Me IIT - PowerPoint PPT Presentation

Mastering Data with Spark and ML Strata London 2019 About Me IIT Delhi, 1998 Founder and CEO, Nube Technologies Strata Data San Jose Program Committee Speaker at Spark Summit, Strata, GIDS etc Nube India based startup Deep technical

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

THE MOD METHOD with VESPERS MASTERING In this Module What mastering can do & what it

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

Large-Scale Data Engineering Spark and MLLIB event.cwi.nl/lsde OVERVIEW OF SPARK

Spark Overview / High-level Architecture Indexing from Spark Reading data from Solr + term

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

APPLICATION OF SANDWICH STRUCTURES TO AUTOMOTIVE RIMS A. Romeo 1* , D. P. Boso 1 , U. Galvanetto 1

Turing Categories and Realizability Chad Nester Joint work with Robin Cockett University of

Starkiller: A Static Type Inferencer and Compiler for Python Michael Salib msalib@alum.mit.edu

2103213 Eng Mech I Chapter 2 Force Systems 2.1

Fiberbundle-Based Visualization of a Stir Tank Fluid Benger Werner, Ritter Marcel, Archaya

Mauro Boero Institut de Physique et Chimie des Matriaux de Strasbourg University of Strasbourg

HM-215K and HM-215L Update Regulatory Changes Relating to Ammunition for SAAMI Monday, January

Prepared by: T. Dobbs, J. Netzer, J. Salmons and J. Wiseman J2 Scientific, 1901 Pennsylvania

Sambuz

Useful Links

Newsletter

Mail Us

Mastering Data with Spark and ML Strata London 2019 About Me IIT - PowerPoint PPT Presentation

Mastering Data with Spark and ML Strata London 2019 About Me IIT Delhi, 1998 Founder and CEO, Nube Technologies Strata Data San Jose Program Committee Speaker at Spark Summit, Strata, GIDS etc Nube India based startup Deep technical

Spark Code Camp Discover Spark Streaming &amp; Spark SQL Project Overview Focus on Spark

THE MOD METHOD with VESPERS MASTERING In this Module What mastering can do &amp; what it

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

Large-Scale Data Engineering Spark and MLLIB event.cwi.nl/lsde OVERVIEW OF SPARK

Spark Overview / High-level Architecture Indexing from Spark Reading data from Solr + term

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

APPLICATION OF SANDWICH STRUCTURES TO AUTOMOTIVE RIMS A. Romeo 1* , D. P. Boso 1 , U. Galvanetto 1

Turing Categories and Realizability Chad Nester Joint work with Robin Cockett University of

Starkiller: A Static Type Inferencer and Compiler for Python Michael Salib msalib@alum.mit.edu

2103213 Eng Mech I Chapter 2 Force Systems 2.1

Fiberbundle-Based Visualization of a Stir Tank Fluid Benger Werner, Ritter Marcel, Archaya

Mauro Boero Institut de Physique et Chimie des Matriaux de Strasbourg University of Strasbourg

HM-215K and HM-215L Update Regulatory Changes Relating to Ammunition for SAAMI Monday, January

Prepared by: T. Dobbs, J. Netzer, J. Salmons and J. Wiseman J2 Scientific, 1901 Pennsylvania

Sambuz

Useful Links

Newsletter

Mail Us

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

THE MOD METHOD with VESPERS MASTERING In this Module What mastering can do & what it