data analysis and map reduce with mongodb and pymongo
play

Data Analysis and Map-Reduce with MongoDB and pymongo Alexander C. - PowerPoint PPT Presentation

Data Analysis and Map-Reduce with MongoDB and pymongo Alexander C. S. Hendorf, EuroPython 2015, Bilbao @opotoc Alexander C. S. Hendorf Mannheim, Germany IT is my 'second career' developer @my own company opotoc IT GmbH mongoDB


  1. Data Analysis and Map-Reduce with MongoDB and pymongo Alexander C. S. Hendorf, EuroPython 2015, Bilbao @opotoc

  2. Alexander C. S. Hendorf • Mannheim, Germany • IT is my 'second career' • developer @my own company opotoc IT GmbH • mongoDB MUG organiser • speaker, sometimes trainer • EP2015 program WG co-chair

  3. Today 1. mongoDB / document orientented database 2. What's the mongoDB aggregation framework? 3. Pipeline model 4. Pipeline stages 5. Map Reduce in mongoDB some live demos

  4. Document oriented databases in 15 seconds document collection database json-like object do document do do do do do document do do do do do document do do do do do document do do do do do document do { document document document "_id": 1, document document "say": "Hello" document } do do do do do do do do do do no schema do do do do do do do do do do do do enforced

  5. mongoDB aggregation framework • introduced with mongoDB 2.2 in 2012 • framework for data aggregation • documents enter a multi-stage pipeline that transforms the documents into an aggregated results • it's designed 'straight-forward' • all operations have an optimization phase which attempts to reshape the pipeline for improved performance

  6. Pipeline is like a relay race $match $project $group something smart get the baton present nicely

  7. • mongoDB 3.0 • WiredTiger storage engine • driver: pymongo • dataset 37GB, compressed with WT ~9GB • collection of playlists from the iTunes Music Store • playlists that appeared in some chart sometime in the past 3 years somewhere around the world

Recommend


More recommend