big data in hybrid worlds
play

BIG DATA IN HYBRID WORLDS The Story of M H i ! Im Florian CEO - PowerPoint PPT Presentation

BIG DATA IN HYBRID WORLDS The Story of M H i ! Im Florian CEO of Dataiku maker Data Science Studio , the Photoshop for Data Science React on twitter COMMUNITY EDITION (its FREE)


  1. BIG DATA 
 IN HYBRID WORLDS The Story of M

  2. H i ! I’m Florian CEO of Dataiku maker Data ¡Science ¡Studio , 
 the « Photoshop for Data Science » React on twitter COMMUNITY ¡EDITION ¡(it’s ¡FREE) ¡ ¡ http://www.dataiku.com/dss/trynow/ @fdouetteau #BigDataParis

  3. B i g o r S m a l l Startup Big Firm

  4. H O W D O P E O P L E TA K E D E C I S I O N S

  5. B U Y I N G D E C I S I O N S Should I buy it ?

  6. S O C I A L D E C I S I O N S Should I talk to him ?

  7. B u s i n e s s D e c i s i o n s M LIKE MEETING

  8. B u s i n e s s I n t e l l i g e n c e

  9. B u s i n e s s I n t e l l i g e n c e

  10. Volume Variety Velocity IN 2001 man (actually Gartner) invented big data

  11. WHAT IF THE META GROUP HAD CHOSEN ANOTHER LETTER? C apacity C omplexity C elerity S ize S erendipity S peed B ig B lur B lazing

  12. Or Combine C om….. B u.. S h..

  13. BIG DATA RELIGION ?

  14. M LIKE METRICS

  15. M L I K E M E T R I C S How much does it cost to produce and maintain a metric ? How many metrics do I need ? Do I Follow the right metrics ? Do I Have enough data ? Do I Have enough Data?

  16. M o r e M e t r i c s M e a n s M o r e M e a n s • Self-Service 
 Build your own metrics • Analytical Capabilities 
 Find your patterns 
 • Large Volume 
 Store it all

  17. M o r e M e t r i c s M e a n s M o r e A p p l i c a t i o n Sheer Analyze DATA DATA Each Tweet Curiosity MINING EXPLORATION Optimization Customer Consumption Web Navigation 
 For Anti-Churn For E-Merchant in Utilities Ticket Data For Discounts in Retail LARGE PRODUCTION Filings For Fraud PLATFORM in Insurance Reporting for Finance Phone Call RTB Data Mission in Any Industry Logs for Security For Advertising Critical CLASSIC BI Small Large Structured Diverse

  18. TO DAY E A C H O W N A S I T S S TO R E Sheer DATA DATA Curiosity MINING EXPLORATION DATA MINING REPOSITORIES DATA LAKE Optimization D LARGE PRODUCTION DATA PLATFORM GOOGLE LIKE WAREHOUSING PLATFORM Mission Critical CLASSIC BI Small Large Structured Diverse

  19. i t ’s n o t j u s t a b o u t t h e m e t r i c s

  20. DATA D R I V E N B U S I N E S S

  21. P r o b l e m i s t h e h u m a n Cannot take decisions in seconds Limited sight (100 rows) Limited short term memory (10k rows)?

  22. M LIKE MACHINE

  23. R i s e o f A I 2005 Autonomous 1974 - 1993 2012 Google Cat Vehicule AI Winters 2011 Watson’s Jeopardy 1997 Deep Blue

  24. Churn Segmentation Recommender Lifetime Value Volume Forecast Risk Score Hot Location APPLICATIONS OF MACHINE LEARNING TO BUSINESS PROBLEMS Pricing Ranking Event Paths Fraud www.dataiku.com

  25. P R E D I C T I V E M A I N C O N F O R T Z O N E Sheer Analyze Each Tweet Curiosity Not Enough “Hard" Examples So that you can learn Optimization Customer Consumption Web Navigation 
 For Anti-Churn For E-Merchant in Utilities Ticket Data For Discounts in Retail Not Enough Data To Learn Filings For Fraud From ? in Insurance Reporting for Finance Phone Call RTB Data Mission in Any Industry Logs for Security For Advertising Critical Small Large Structured Diverse

  26. Welcome to Technoslavia Hadoop Machine Learning NOSQL Nihiland Ceph Scalability Central Mystery Land Sphere Elastic Search Cassandra SOLR Scikit-Learn GraphLAB Kafka Flume prediction.io jubatus MongoDB Spark Mahout Riak WEKA Membase MLBase LibSVM Storm R SQL Colunnar Republic Real-time island InfiniDB Drill Spark SQL RapidMiner Hive Impala Pig Panda … Kibana Cascading Statistician Old House Talend Data Clean Wasteland Vizualization County Dataiku - Pig, Hive and Cascading

  27. E m b r a c e M a n y S k i l l s M a n y - S e t s DREAM Business BI Data Analyst Manager Scientist JOB REAL JOB Data Data Data Cleaner Plumberer Waiter

  28. COMMENT AMÉLIORER LA PERTINENCE DE NOS RÉPONSES 
 VIA L’ANALYSE DU COMPORTEMENT UTILISATEUR ? ✓ ✗ 20 M • Reformulation de la >10 1,4M occurrences recherche requêtes • Pas de réponse Analyse & corrections • Clic sur un pro >200M • Top recherche recherches • Clic de navigation ou filtre automatisation 0,5M requêtes priorisées

  29. "PREDICTIVE CONTENT MANAGEMENT” FROM PAGES JAUNES pagesjaunes.fr Autres Annuaire crawl référentiels hadoop PIG+Hive Moteur d’interprétation Gestion Exploration Sickit-learn indexation Export Machine

  30. O p t i m i z i n g L a s t M i l e w i t h D a t a S c i e n c e S t u d i o by Data Science Studio Historical delivery Cleaning and temporal Data aggregation by Modeling of a score and retrieval data enrichment of data geographic location for each delivery Incorporation of new deliveries to the existing model

  31. E X P LO R E N E W W O R D S Sheer Curiosity NOT BEING RELEVANT DANGER ZONE Optimization Analytics Self Service EXPLORE POTENTIAL Predictive Cluster Optimize Existing Build Mandatory BI Capabilities Mission Large Volume Capabilities Critical Small Large Structured Diverse

  32. www.dataiku.com

Recommend


More recommend