big data
play

BIG DATA CONFERENCE How to transform data into money using Big - PowerPoint PPT Presentation

APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we


  1. APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies

  2. INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we were early adopters of Spark, which led to the creation of Stratio

  3. MY PROFILE SKILLS JORGE LOPEZ-MALLA After working with traditional processing methods, I started to do some R&S Big Data projects and I fell in love with the Big Data world. Currently i’m doing some awesome Big Data projects at Stratio

  4. MY PROFILE SKILLS ALBERTO RODRÍGUEZ DE LEM A After graduating I've been programming for more than 10 years. I’ve built high performance and scalable web applications for companies such as Indra Systems, Prudential and Springer Verlag Ltd. @ardlema

  5. STRATIO GO TO SPACE SPARK-BASED BD ENTERPRISE SPARK PLATFORM On – premise & cloud, our platform is The first Spark-Based big data geared towards helping companies platform released I I PURE SPARK OPEN-SOURCE SOLUTIONS The only pure Spark platform, Our enterprises solutions are the only global solution based on open source technologies

  6. OUR CLIENT M IDDLE EAST TELCO COM PANY o 9.500 mil. daily eventsprocessed o 9.2 mil. clients

  7. USE CASES

  8. USE CASES 1 M ANAGEM ENT & NORM ALIZATION OF DATA SOURCES

  9. USE CASES 1 M ANAGEM ENT & NORM ALIZATION OF DATA SOURCES

  10. USE CASES 2 NETWORK COVERAGE IM PROVEM ENT

  11. USE CASES 3 PEOPLE GATHERING

  12. USE CASES 3 PEOPLE GATHERING

  13. USE CASES 4 DATA M ONETIZATION

  14. USE CASES 4 DATA M ONETIZATION

  15. USE CASES 4 DATA M ONETIZATION

  16. TECHNICAL CHALLENGES

  17. TECHNICAL PROBLEMS 1 2 3 4 5 Huge volumen Huge size Distributed Hard Recognized of data of Data processing to read patterns

  18. 1 HUGE VOLUM E OF DATA SOLUTION APACHE HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

  19. 1 HUGE VOLUM E OF DATA 9500 mil. csv daily records-> circa 1 6 Gb Requirements: High availability Concurrent file reads

  20. 2 HUGE SIZE OF DATA SOLUTION APACHE PARQUET

  21. 2 HUGE SIZE OF DATA 1 6.5 Gb of daily event information stored as csv text in HDFS 4.3 Gb of daily event information stored as parquet files in HDFS STORE IM PROVEM ENT Circa 70 %

  22. 2 HUGE SIZE OF DATA Time to count daily csv events -> 6.2 minutes . Time to count daily Parquet events -> 1 minute READ PROCESS IM PROVEM ENT Circa 80%

  23. 3 DISTRIBUTED PROCESSING SOLUTION APACHE SPARK

  24. 3 DISTRIBUTED PROCESSING - REQUIREM EN TS Complex algorithmicswith the minimum amount of resources Reduction of the processtime in order to obtain data when it still isused

  25. 3 DISTRIBUTED PROCESSING - REQUIREMENTS Sharing the cluster with legacy processes Use of legacy outputs processeswithout doesany change

  26. 4 HARD TO READ SOLUTION SCALA + APACHE SPARK

  27. 4 HARD TO READ Reducing developing time LOCsdramatically reduced Number of classesdramatically reduced

  28. 4 HARD TO READ Testsand application readability improvements DSLsmake our liveseasier Spark makesMap Reduces jobseven simpler

  29. 5 RECOGNIZED PATTERNS SOLUTION APACHE SPARK M LLIB

  30. 5 RECOGNIZED PATTERNS Millonsof data processed in order to obtain mathematical models Applied complex mathematical algorithms to obtain accurate weekly behaviors

  31. THANK YOU UNITED STATES EUROPE Tel: (+1) 408 5998830 Tel: (+34) 91 828 64 73 contact@stratio.com www.stratio.com

Recommend


More recommend