BIG DATA IN HYBRID WORLDS The Story of M
H i ! I’m Florian CEO of Dataiku maker Data ¡Science ¡Studio , the « Photoshop for Data Science » React on twitter COMMUNITY ¡EDITION ¡(it’s ¡FREE) ¡ ¡ http://www.dataiku.com/dss/trynow/ @fdouetteau #BigDataParis
B i g o r S m a l l Startup Big Firm
H O W D O P E O P L E TA K E D E C I S I O N S
B U Y I N G D E C I S I O N S Should I buy it ?
S O C I A L D E C I S I O N S Should I talk to him ?
B u s i n e s s D e c i s i o n s M LIKE MEETING
B u s i n e s s I n t e l l i g e n c e
B u s i n e s s I n t e l l i g e n c e
Volume Variety Velocity IN 2001 man (actually Gartner) invented big data
WHAT IF THE META GROUP HAD CHOSEN ANOTHER LETTER? C apacity C omplexity C elerity S ize S erendipity S peed B ig B lur B lazing
Or Combine C om….. B u.. S h..
BIG DATA RELIGION ?
M LIKE METRICS
M L I K E M E T R I C S How much does it cost to produce and maintain a metric ? How many metrics do I need ? Do I Follow the right metrics ? Do I Have enough data ? Do I Have enough Data?
M o r e M e t r i c s M e a n s M o r e M e a n s • Self-Service Build your own metrics • Analytical Capabilities Find your patterns • Large Volume Store it all
M o r e M e t r i c s M e a n s M o r e A p p l i c a t i o n Sheer Analyze DATA DATA Each Tweet Curiosity MINING EXPLORATION Optimization Customer Consumption Web Navigation For Anti-Churn For E-Merchant in Utilities Ticket Data For Discounts in Retail LARGE PRODUCTION Filings For Fraud PLATFORM in Insurance Reporting for Finance Phone Call RTB Data Mission in Any Industry Logs for Security For Advertising Critical CLASSIC BI Small Large Structured Diverse
TO DAY E A C H O W N A S I T S S TO R E Sheer DATA DATA Curiosity MINING EXPLORATION DATA MINING REPOSITORIES DATA LAKE Optimization D LARGE PRODUCTION DATA PLATFORM GOOGLE LIKE WAREHOUSING PLATFORM Mission Critical CLASSIC BI Small Large Structured Diverse
i t ’s n o t j u s t a b o u t t h e m e t r i c s
DATA D R I V E N B U S I N E S S
P r o b l e m i s t h e h u m a n Cannot take decisions in seconds Limited sight (100 rows) Limited short term memory (10k rows)?
M LIKE MACHINE
R i s e o f A I 2005 Autonomous 1974 - 1993 2012 Google Cat Vehicule AI Winters 2011 Watson’s Jeopardy 1997 Deep Blue
Churn Segmentation Recommender Lifetime Value Volume Forecast Risk Score Hot Location APPLICATIONS OF MACHINE LEARNING TO BUSINESS PROBLEMS Pricing Ranking Event Paths Fraud www.dataiku.com
P R E D I C T I V E M A I N C O N F O R T Z O N E Sheer Analyze Each Tweet Curiosity Not Enough “Hard" Examples So that you can learn Optimization Customer Consumption Web Navigation For Anti-Churn For E-Merchant in Utilities Ticket Data For Discounts in Retail Not Enough Data To Learn Filings For Fraud From ? in Insurance Reporting for Finance Phone Call RTB Data Mission in Any Industry Logs for Security For Advertising Critical Small Large Structured Diverse
Welcome to Technoslavia Hadoop Machine Learning NOSQL Nihiland Ceph Scalability Central Mystery Land Sphere Elastic Search Cassandra SOLR Scikit-Learn GraphLAB Kafka Flume prediction.io jubatus MongoDB Spark Mahout Riak WEKA Membase MLBase LibSVM Storm R SQL Colunnar Republic Real-time island InfiniDB Drill Spark SQL RapidMiner Hive Impala Pig Panda … Kibana Cascading Statistician Old House Talend Data Clean Wasteland Vizualization County Dataiku - Pig, Hive and Cascading
E m b r a c e M a n y S k i l l s M a n y - S e t s DREAM Business BI Data Analyst Manager Scientist JOB REAL JOB Data Data Data Cleaner Plumberer Waiter
COMMENT AMÉLIORER LA PERTINENCE DE NOS RÉPONSES VIA L’ANALYSE DU COMPORTEMENT UTILISATEUR ? ✓ ✗ 20 M • Reformulation de la >10 1,4M occurrences recherche requêtes • Pas de réponse Analyse & corrections • Clic sur un pro >200M • Top recherche recherches • Clic de navigation ou filtre automatisation 0,5M requêtes priorisées
"PREDICTIVE CONTENT MANAGEMENT” FROM PAGES JAUNES pagesjaunes.fr Autres Annuaire crawl référentiels hadoop PIG+Hive Moteur d’interprétation Gestion Exploration Sickit-learn indexation Export Machine
O p t i m i z i n g L a s t M i l e w i t h D a t a S c i e n c e S t u d i o by Data Science Studio Historical delivery Cleaning and temporal Data aggregation by Modeling of a score and retrieval data enrichment of data geographic location for each delivery Incorporation of new deliveries to the existing model
E X P LO R E N E W W O R D S Sheer Curiosity NOT BEING RELEVANT DANGER ZONE Optimization Analytics Self Service EXPLORE POTENTIAL Predictive Cluster Optimize Existing Build Mandatory BI Capabilities Mission Large Volume Capabilities Critical Small Large Structured Diverse
www.dataiku.com
Recommend
More recommend