BIG DATA CONFERENCE How to transform data into money using Big - PowerPoint PPT Presentation

APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies

INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we were early adopters of Spark, which led to the creation of Stratio

MY PROFILE SKILLS JORGE LOPEZ-MALLA After working with traditional processing methods, I started to do some R&S Big Data projects and I fell in love with the Big Data world. Currently i’m doing some awesome Big Data projects at Stratio

MY PROFILE SKILLS ALBERTO RODRÍGUEZ DE LEM A After graduating I've been programming for more than 10 years. I’ve built high performance and scalable web applications for companies such as Indra Systems, Prudential and Springer Verlag Ltd. @ardlema

STRATIO GO TO SPACE SPARK-BASED BD ENTERPRISE SPARK PLATFORM On – premise & cloud, our platform is The first Spark-Based big data geared towards helping companies platform released I I PURE SPARK OPEN-SOURCE SOLUTIONS The only pure Spark platform, Our enterprises solutions are the only global solution based on open source technologies

OUR CLIENT M IDDLE EAST TELCO COM PANY o 9.500 mil. daily eventsprocessed o 9.2 mil. clients

USE CASES

USE CASES 1 M ANAGEM ENT & NORM ALIZATION OF DATA SOURCES

USE CASES 2 NETWORK COVERAGE IM PROVEM ENT

USE CASES 3 PEOPLE GATHERING

USE CASES 4 DATA M ONETIZATION

TECHNICAL CHALLENGES

TECHNICAL PROBLEMS 1 2 3 4 5 Huge volumen Huge size Distributed Hard Recognized of data of Data processing to read patterns

1 HUGE VOLUM E OF DATA SOLUTION APACHE HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

1 HUGE VOLUM E OF DATA 9500 mil. csv daily records-> circa 1 6 Gb Requirements: High availability Concurrent file reads

2 HUGE SIZE OF DATA SOLUTION APACHE PARQUET

2 HUGE SIZE OF DATA 1 6.5 Gb of daily event information stored as csv text in HDFS 4.3 Gb of daily event information stored as parquet files in HDFS STORE IM PROVEM ENT Circa 70 %

2 HUGE SIZE OF DATA Time to count daily csv events -> 6.2 minutes . Time to count daily Parquet events -> 1 minute READ PROCESS IM PROVEM ENT Circa 80%

3 DISTRIBUTED PROCESSING SOLUTION APACHE SPARK

3 DISTRIBUTED PROCESSING - REQUIREM EN TS Complex algorithmicswith the minimum amount of resources Reduction of the processtime in order to obtain data when it still isused

3 DISTRIBUTED PROCESSING - REQUIREMENTS Sharing the cluster with legacy processes Use of legacy outputs processeswithout doesany change

4 HARD TO READ SOLUTION SCALA + APACHE SPARK

4 HARD TO READ Reducing developing time LOCsdramatically reduced Number of classesdramatically reduced

4 HARD TO READ Testsand application readability improvements DSLsmake our liveseasier Spark makesMap Reduces jobseven simpler

5 RECOGNIZED PATTERNS SOLUTION APACHE SPARK M LLIB

5 RECOGNIZED PATTERNS Millonsof data processed in order to obtain mathematical models Applied complex mathematical algorithms to obtain accurate weekly behaviors

THANK YOU UNITED STATES EUROPE Tel: (+1) 408 5998830 Tel: (+34) 91 828 64 73 contact@stratio.com www.stratio.com

BIG DATA CONFERENCE How to transform data into money using Big - PowerPoint PPT Presentation

APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we

Data Preparation Data cleaning Data integration and transformation (Data

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Data Preparation Data cleaning Discretization (Data preprocessing) Data

DATA QUALITY AND DATA DATA QUALITY AND DATA PROGRAMMING PROGRAMMING "Data cleaning and

Data stories with The Pudding So what is data storytelling? Data academia & data science

Data Acquisition Chapter 2 Data Acquisition 1 st step: get data Usually data gathered by

Data Collection and Aggregation Data Collection and Aggregation 1 Challenges: data Challenges:

Business Statistics CONTENTS The role of data The data matrix Data types Aspects of data

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

q.Datum Data Exchange The marketplace for big data Data is the new oil of the Internet and the

Concatenating data Cleaning Data in Python Combining data Data may not always come in 1

Prioritizing Data and Purpose of Data Points What data do I have? What data do I trust? What

Data Preparation Discretization Data cleaning (Data pre-processing) Data

Introduction to IoT data AN ALYZ IN G IOT DATA IN P YTH ON Matthias Voppichler IT Developer

Modeling Data the different views on Data Mining Views on Data Mining Fitting the data

Abstract Data Types & Templates 1 Abstract Data Type 2 Data and Abstraction

Tidy data CLEAN IN G DATA IN P YTH ON Daniel Chen Instructor Tidy data Tidy Data paper

Big Data Analytics Armistead Boyd SVP, Product & Data Partnerships October 25, 2016 What is

Concatenating data CLEAN IN G DATA IN P YTH ON Daniel Chen Instructor Combining data Data may

Data Collection and HIVe Current Data Collection For those collecting data, you are use to

BIG DATA CONFERENCE How to transform data into money using Big - PowerPoint PPT Presentation

APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we

Data Preparation Data cleaning Data integration and transformation (Data

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Data Preparation Data cleaning Discretization (Data preprocessing) Data

DATA QUALITY AND DATA DATA QUALITY AND DATA PROGRAMMING PROGRAMMING &quot;Data cleaning and

Data stories with The Pudding So what is data storytelling? Data academia &amp; data science

Data Acquisition Chapter 2 Data Acquisition 1 st step: get data Usually data gathered by

Data Collection and Aggregation Data Collection and Aggregation 1 Challenges: data Challenges:

Business Statistics CONTENTS The role of data The data matrix Data types Aspects of data

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

q.Datum Data Exchange The marketplace for big data Data is the new oil of the Internet and the

Concatenating data Cleaning Data in Python Combining data Data may not always come in 1

Prioritizing Data and Purpose of Data Points What data do I have? What data do I trust? What

Data Preparation Discretization Data cleaning (Data pre-processing) Data

Introduction to IoT data AN ALYZ IN G IOT DATA IN P YTH ON Matthias Voppichler IT Developer

Modeling Data the different views on Data Mining Views on Data Mining Fitting the data

Abstract Data Types &amp; Templates 1 Abstract Data Type 2 Data and Abstraction

Tidy data CLEAN IN G DATA IN P YTH ON Daniel Chen Instructor Tidy data Tidy Data paper

Big Data Analytics Armistead Boyd SVP, Product &amp; Data Partnerships October 25, 2016 What is

Concatenating data CLEAN IN G DATA IN P YTH ON Daniel Chen Instructor Combining data Data may

Data Collection and HIVe Current Data Collection For those collecting data, you are use to

DATA QUALITY AND DATA DATA QUALITY AND DATA PROGRAMMING PROGRAMMING "Data cleaning and

Data stories with The Pudding So what is data storytelling? Data academia & data science

Abstract Data Types & Templates 1 Abstract Data Type 2 Data and Abstraction

Big Data Analytics Armistead Boyd SVP, Product & Data Partnerships October 25, 2016 What is