Data Acquisition Axel Ngonga Lead Data Acquisition BIG Data PPF http://big-project.eu
Motivation ● Increasing amout of data ○ 4K new pictures on Instagram ○ 100K tweets ○ 800K new pieces of content on Facebook ○ …
Motivation
Motivation ● Big data technologies for ○ Improved business intelligence ○ Secure decisions ○ Customized services ○ … ● Use Cases ○ Mission planning ○ Trade market ○ Customized services ○ Criminality prediction ○ ...
Definition ● Data acquisition stands for ○ Selecting of data sources ○ Collection of information from these sources ○ Filtering and cleaning data
Overview DS Processing DS Storage (cleaning, classification) DS DS
More than 3 Vs ● The 9(?) Vs of Big Data Acquisition ○ Volume ○ Velocity ○ Variety ○ Vocabulary ○ Variability (security models, ownership) ○ Veracity (trustworthiness of data) ○ Visibility (integrated view of data) ○ Value (worth of data for data consumer) ○ Visualization
Requirements ● Extensibility of protocols ● High scalability of approaches ● Low memory consumption ● Parallelism ● Elasticity ● Fast ROI ● High throughput (real-time)
Technology Overview ● Gathering ○ Advanced Message Queuing Protocol ■ Wire-level protocol ■ OASIS Standard since Oct. 2012 ■ Large number of implementations incl. RabbitMQ, SwiftMQ, Apache ActiveMQ, Windows Azure Service Bus ○ JMS 2.0 ○ Kestrel (Memcached) ○ Apache Kafka ○ Apache Flume (log data) ○ FB Scribe (log data)
Technology Overview ● Processing ○ Facebook Scribe (Aggregation) ○ Twitter Storm (Stream Data Processing, Analysis) ○ MOA (Massive Online Analysis, esp. classification) ○ Hadoop (Distributed Processing) ○ InfoSphere Streams (Analysis)
Technology Overview ● Storage ○ MongoDB (BSON) ○ Apache CouchDB (JSON) ○ Neo4J (Graph DB) ○ Oracle NoSQL ○ IBM DB2 NoSQL ● Holistic Frameworks ○ Oracle's Big Data Suite ○ IBM's Big Data Suite ○ Karmasphere
Tool Matrix
Simple Recipe 1. Which of the 9Vs are important for me? 2. What are my sources? ○ Protocols ○ Velocity ○ Type of data (logs, XML, …) ○ ... 3. What’s my current storage architecture? ○ NoSQL? ○ Distributed?
Thank You! Questions? Axel Ngonga University of Leipzig AKSW Research Group ngonga@informatik.uni-leipzig.de http://aksw.org/AxelNgonga http://big-project.eu
Questionnaire
Recommend
More recommend