mining large datasets case of mining graph data in the
play

Mining Large Datasets: Case of Mining Graph Data in the Cloud - PowerPoint PPT Presentation

Background Contributions Conclusion Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent dOrazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining


  1. Background Contributions Conclusion Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d’Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 1 / 50

  2. Background Contributions Conclusion Context and motivations Application domains Computer networks, Social networks, Bioinformatics, Protein structure Chemoinformatics. Graph representation Chemical compound Data modeling. Identifying relationship patterns and rules. Social network Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 2 / 50

  3. Background Contributions Conclusion Context and motivations Mining graph data Graph mining aims to find patterns, hidden relations and behaviors in data. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 3 / 50

  4. Background Contributions Conclusion Context and motivations Mining graph data Graph mining aims to find patterns, hidden relations and behaviors in data. Mining graph goals Computing graph properties: Density, diameter, radius, ... Mining substructures from graph databases. Substructures: paths, trees, subgraphs. Frequent Subgraph Mining (FSM) task. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 3 / 50

  5. Background Contributions Conclusion Context and motivations Availability of graph data Exponential growth in both size and number of graphs in databases. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 4 / 50

  6. Background Contributions Conclusion Context and motivations Availability of graph data Exponential growth in both size and number of graphs in databases. Availability of graph data sources: The protein data bank (PDB) contains 95280 of protein 3D structures. Facebook loads 60 terabytes of new data every day [Thusoo 2010] . Google processes 20 petabytes of data per day [Dean 2008] . Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 4 / 50

  7. Background Contributions Conclusion Context and motivations Availability of graph data Exponential growth in both size and number of graphs in databases. Availability of graph data sources: The protein data bank (PDB) contains 95280 of protein 3D structures. Facebook loads 60 terabytes of new data every day [Thusoo 2010] . Google processes 20 petabytes of data per day [Dean 2008] . 3Vs of Big Data (Volume, Velocity and Variety). Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 4 / 50

  8. Background Contributions Conclusion Context and motivations Availability of graph data Exponential growth in both size and number of graphs in databases. Availability of graph data sources: The protein data bank (PDB) contains 95280 of protein 3D structures. Facebook loads 60 terabytes of new data every day [Thusoo 2010] . Google processes 20 petabytes of data per day [Dean 2008] . 3Vs of Big Data (Volume, Velocity and Variety). Availability of cloud computing environments. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 4 / 50

  9. Background Contributions Conclusion Context and motivations In this work We are interested to FSM from graph databases. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 5 / 50

  10. Background Contributions Conclusion Context and motivations In this work We are interested to FSM from graph databases. Frequent subgraph mining algorithms Various approaches of FSM. Existing approaches are mainly: Tested on centralized computing systems. Evaluated on relatively small databases. Few works for FSM in the cloud. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 5 / 50

  11. Background Contributions Conclusion Goals Questions Distributed FSM from large graph database. Data/computation distribution. Tuning cloud parameters. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 6 / 50

  12. Background Contributions Conclusion Outline 1 Background 2 Contributions 3 Conclusion Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 7 / 50

  13. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Outline 1 Background Graph mining Cloud computing Frameworks for large data processing in the cloud Related works 2 Contributions 3 Conclusion Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 8 / 50

  14. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Outline 1 Background Graph mining Cloud computing Frameworks for large data processing in the cloud Related works 2 Contributions Distributed subgraph mining in the cloud 3 Conclusion Contributions Prospects Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 9 / 50

  15. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Background Graph A graph is denoted as G = ( V , E ) where V is a set of nodes and E is a set of edges. Subgraph A graph G ′ = ( V ′ , E ′ ) is a subgraph of another graph G = ( V , E ) iff: V ′ ⊆ V , and E ′ ⊆ E ∩ ( V ′ × V ′ ). Density The density of a graph G = ( V , E ) is 2 ·| E | calculated by density ( G ) = ( | V |· ( | V |− 1)) . Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 10 / 50

  16. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Outline 1 Background Graph mining Cloud computing Frameworks for large data processing in the cloud Related works 2 Contributions Distributed subgraph mining in the cloud 3 Conclusion Contributions Prospects Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 11 / 50

  17. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Background Cloud computing Large number of computers that are connected via Internet. Applications delivered as services. Hardware and system software delivered as services. Pay as you go. Cloud services can be rapidly and elastically provisioned. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 12 / 50

  18. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Background Service models Software as a Service (SaaS). Platform as a Service (PaaS), Infrastructure as a Service (IaaS), Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 13 / 50

  19. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Outline 1 Background Graph mining Cloud computing Frameworks for large data processing in the cloud Related works 2 Contributions Distributed subgraph mining in the cloud 3 Conclusion Contributions Prospects Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 14 / 50

  20. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Background MapReduce framework A framework for processing huge datasets. Large number of computers and task/node failures. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 15 / 50

  21. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Background MapReduce framework A framework for processing huge datasets. Large number of computers and task/node failures. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 15 / 50

  22. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Background Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 16 / 50

  23. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Background SPARK framework A general engine for large-scale data processing. Combine SQL, streaming, and complex analytics. It offers several high-level operators that make it easy to build parallel applications. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 17 / 50

  24. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Background SHARK framework A distributed SQL query engine for Hadoop. Based on SPARK and uses the existing Hive client and metastore. Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 18 / 50

  25. Graph mining Background Cloud computing Contributions Frameworks for large data processing in the cloud Conclusion Related works Outline 1 Background Graph mining Cloud computing Frameworks for large data processing in the cloud Related works 2 Contributions Distributed subgraph mining in the cloud 3 Conclusion Contributions Prospects Sabeur Aridhi Mining Large Datasets - Big Data Forum - Lyon 19 / 50

Recommend


More recommend