data acquisition
play

Data Acquisition Axel Ngonga Lead Data Acquisition BIG Data PPF - PowerPoint PPT Presentation

Data Acquisition Axel Ngonga Lead Data Acquisition BIG Data PPF http://big-project.eu Motivation Increasing amout of data 4K new pictures on Instagram 100K tweets 800K new pieces of content on Facebook Motivation


  1. Data Acquisition Axel Ngonga Lead Data Acquisition BIG Data PPF http://big-project.eu

  2. Motivation ● Increasing amout of data ○ 4K new pictures on Instagram ○ 100K tweets ○ 800K new pieces of content on Facebook ○ …

  3. Motivation

  4. Motivation ● Big data technologies for ○ Improved business intelligence ○ Secure decisions ○ Customized services ○ … ● Use Cases ○ Mission planning ○ Trade market ○ Customized services ○ Criminality prediction ○ ...

  5. Definition ● Data acquisition stands for ○ Selecting of data sources ○ Collection of information from these sources ○ Filtering and cleaning data

  6. Overview DS Processing DS Storage (cleaning, classification) DS DS

  7. More than 3 Vs ● The 9(?) Vs of Big Data Acquisition ○ Volume ○ Velocity ○ Variety ○ Vocabulary ○ Variability (security models, ownership) ○ Veracity (trustworthiness of data) ○ Visibility (integrated view of data) ○ Value (worth of data for data consumer) ○ Visualization

  8. Requirements ● Extensibility of protocols ● High scalability of approaches ● Low memory consumption ● Parallelism ● Elasticity ● Fast ROI ● High throughput (real-time)

  9. Technology Overview ● Gathering ○ Advanced Message Queuing Protocol ■ Wire-level protocol ■ OASIS Standard since Oct. 2012 ■ Large number of implementations incl. RabbitMQ, SwiftMQ, Apache ActiveMQ, Windows Azure Service Bus ○ JMS 2.0 ○ Kestrel (Memcached) ○ Apache Kafka ○ Apache Flume (log data) ○ FB Scribe (log data)

  10. Technology Overview ● Processing ○ Facebook Scribe (Aggregation) ○ Twitter Storm (Stream Data Processing, Analysis) ○ MOA (Massive Online Analysis, esp. classification) ○ Hadoop (Distributed Processing) ○ InfoSphere Streams (Analysis)

  11. Technology Overview ● Storage ○ MongoDB (BSON) ○ Apache CouchDB (JSON) ○ Neo4J (Graph DB) ○ Oracle NoSQL ○ IBM DB2 NoSQL ● Holistic Frameworks ○ Oracle's Big Data Suite ○ IBM's Big Data Suite ○ Karmasphere

  12. Tool Matrix

  13. Simple Recipe 1. Which of the 9Vs are important for me? 2. What are my sources? ○ Protocols ○ Velocity ○ Type of data (logs, XML, …) ○ ... 3. What’s my current storage architecture? ○ NoSQL? ○ Distributed?

  14. Thank You! Questions? Axel Ngonga University of Leipzig AKSW Research Group ngonga@informatik.uni-leipzig.de http://aksw.org/AxelNgonga http://big-project.eu

  15. Questionnaire

Recommend


More recommend