optimised framework based on rough set theory for big
play

Optimised Framework based on Rough Set Theory for Big Data - PowerPoint PPT Presentation

Recent Trends in Knowledge Compilation Dagstuhl Seminar 17381 Marie Sklodowska-Curie Actions - Individual Fellowship Optimised Framework based on Rough Set Theory for Big Data Pre-processing in Certain and Imprecise Contexts Presented by


  1. Recent Trends in Knowledge Compilation – Dagstuhl Seminar 17381 Marie Sklodowska-Curie Actions - Individual Fellowship Optimised Framework based on Rough Set Theory for Big Data Pre-processing in Certain and Imprecise Contexts Presented by Dr. Zaineb Chelly Dagdia

  2. Introduction Background RoSTBiDFramework Conclusion Outline 1 Introduction Dr. Zaineb Chelly RoSTBiDFramework 1/24

  3. Introduction Background RoSTBiDFramework Conclusion Outline 1 Introduction 2 Background Dr. Zaineb Chelly RoSTBiDFramework 1/24

  4. Introduction Background RoSTBiDFramework Conclusion Outline 1 Introduction 2 Background 3 RoSTBiDFramework Dr. Zaineb Chelly RoSTBiDFramework 1/24

  5. Introduction Background RoSTBiDFramework Conclusion Outline 1 Introduction 2 Background 3 RoSTBiDFramework 4 Conclusion Dr. Zaineb Chelly RoSTBiDFramework 1/24

  6. Introduction Background RoSTBiDFramework Conclusion Outline 1 Introduction 2 Background 3 RoSTBiDFramework 4 Conclusion Dr. Zaineb Chelly RoSTBiDFramework 1/24

  7. Introduction Background RoSTBiDFramework Conclusion The MSC project Proposal title “Optimised Framework based on Rough Set Theory for Big Data Pre-processing in Certain and Imprecise Contexts” Dr. Zaineb Chelly RoSTBiDFramework 2/24

  8. Introduction Background RoSTBiDFramework Conclusion The MSC project Proposal title “Optimised Framework based on Rough Set Theory for Big Data Pre-processing in Certain and Imprecise Contexts” ✎ Duration in months : 24 ✎ Panel ENG : Information Science and Engineering (ENG) ✎ Descriptor : Machine learning, statistical data processing and applications Dr. Zaineb Chelly RoSTBiDFramework 2/24

  9. Introduction Background RoSTBiDFramework Conclusion The MSC project Proposal title “Optimised Framework based on Rough Set Theory for Big Data Pre-processing in Certain and Imprecise Contexts” ✎ Duration in months : 24 ✎ Panel ENG : Information Science and Engineering (ENG) ✎ Descriptor : Machine learning, statistical data processing and applications ✔ Project started on the 1st of March 2017 Dr. Zaineb Chelly RoSTBiDFramework 2/24

  10. Introduction Background RoSTBiDFramework Conclusion The MSC project Dr. Zaineb Chelly RoSTBiDFramework 3/24

  11. Introduction Background RoSTBiDFramework Conclusion Partner organisations Dr. Zaineb Chelly RoSTBiDFramework 4/24

  12. Introduction Background RoSTBiDFramework Conclusion Partner organisations ✓ Host : Aberystwyth University, UK Dr. Zaineb Chelly RoSTBiDFramework 4/24

  13. Introduction Background RoSTBiDFramework Conclusion Partner organisations ✓ Host : Aberystwyth University, UK ✓ Partner Organisations : ➺ University of Birmingham, UK ➺ University of Paris 13, France ➺ University of Granada, Spain ➺ *Non-academic partner France Dr. Zaineb Chelly RoSTBiDFramework 4/24

  14. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion Outline 1 Introduction 2 Background Big Data Rough Set Theory 3 RoSTBiDFramework 4 Conclusion Dr. Zaineb Chelly RoSTBiDFramework 5/24

  15. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion Specification “Datasets which could not be captured, managed, and processed by general computers within an acceptable scope.” – [Apache Hadoop, 2010] – Dr. Zaineb Chelly RoSTBiDFramework 6/24

  16. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion Specification “Datasets which could not be captured, managed, and processed by general computers within an acceptable scope.” – [Apache Hadoop, 2010] – ➜ Having bigger data requires different approaches : ✔ Techniques; ✔ Tools; ✔ Architecture; Dr. Zaineb Chelly RoSTBiDFramework 6/24

  17. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion Distributed processing : Apache Spark ✍ Apache Spark is a lightning-fast cluster computing technology , designed for fast computation. Dr. Zaineb Chelly RoSTBiDFramework 7/24

  18. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion Distributed processing : Apache Spark ✍ Apache Spark is a lightning-fast cluster computing technology , designed for fast computation. ✍ It is based on the MapReduce model. Dr. Zaineb Chelly RoSTBiDFramework 7/24

  19. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion Distributed processing : Apache Spark ✍ Apache Spark is a lightning-fast cluster computing technology , designed for fast computation. ✍ It is based on the MapReduce model. ✍ It is an in-memory cluster computing that increases the processing speed of an application. Dr. Zaineb Chelly RoSTBiDFramework 7/24

  20. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion Distributed processing : Apache Spark ✍ Apache Spark is a lightning-fast cluster computing technology , designed for fast computation. ✍ It is based on the MapReduce model. ✍ It is an in-memory cluster computing that increases the processing speed of an application. ✍ It is based on Resilient Distributed Datasets (RDD) which supports in-memory processing computation. Dr. Zaineb Chelly RoSTBiDFramework 7/24

  21. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion MapReduce ☞ MapReduce divides the workload into multiples independent tasks and schedule them across cluster nodes. Dr. Zaineb Chelly RoSTBiDFramework 8/24

  22. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion MapReduce ☞ MapReduce divides the workload into multiples independent tasks and schedule them across cluster nodes. Data are distributed to all the nodes of the cluster as it is being loaded in. Data are split into chunks which are managed by different nodes in the cluster. ➠ Even though the file chunks are distributed across several machines they form a single namespace. Dr. Zaineb Chelly RoSTBiDFramework 8/24

  23. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion Rough Set Theory Basic Concepts Dr. Zaineb Chelly RoSTBiDFramework 9/24

  24. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion Rough Set Theory Basic Concepts The indiscernibility relations The Lower Approximation The Upper Approximation The Boundary Region The Positive Region The Dependency of attributes Dr. Zaineb Chelly RoSTBiDFramework 9/24

  25. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion RST for Feature Selection Calculate the IND of the classes; 1 Dr. Zaineb Chelly RoSTBiDFramework 10/24

  26. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion RST for Feature Selection Calculate the IND of the classes; 1 Generate all the possible combinations of features; 2 Dr. Zaineb Chelly RoSTBiDFramework 10/24

  27. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion RST for Feature Selection Calculate the IND of the classes; 1 Generate all the possible combinations of features; 2 For each combination : 3 Calculate the IND; Calculate the lower approximation; Calculate the positive region; Calculate the dependency; Dr. Zaineb Chelly RoSTBiDFramework 10/24

  28. Introduction Background Big Data RoSTBiDFramework Rough Set Theory Conclusion RST for Feature Selection Calculate the IND of the classes; 1 Generate all the possible combinations of features; 2 For each combination : 3 Calculate the IND; Calculate the lower approximation; Calculate the positive region; Calculate the dependency; Select the reduct(s) where : 4 The feature set is composed of minimal features; The DEP of the feature set equals the DEP of the data set (all the features); Dr. Zaineb Chelly RoSTBiDFramework 10/24

  29. Introduction Challenges Background Research methodology RoSTBiDFramework Proposed solution Conclusion Outline 1 Introduction 2 Background 3 RoSTBiDFramework Challenges Research methodology Proposed solution 4 Conclusion Dr. Zaineb Chelly RoSTBiDFramework 11/24

  30. Introduction Challenges Background Research methodology RoSTBiDFramework Proposed solution Conclusion Current state Dr. Zaineb Chelly RoSTBiDFramework 12/24

  31. Introduction Challenges Background Research methodology RoSTBiDFramework Proposed solution Conclusion Current state It has become difficult to quickly acquire the most useful information from the huge amount of data at hand. Dr. Zaineb Chelly RoSTBiDFramework 12/24

  32. Introduction Challenges Background Research methodology RoSTBiDFramework Proposed solution Conclusion Current state It has become difficult to quickly acquire the most useful information from the huge amount of data at hand. ➽ It is necessary to perform data (pre-)processing as a first step! Dr. Zaineb Chelly RoSTBiDFramework 12/24

  33. Introduction Challenges Background Research methodology RoSTBiDFramework Proposed solution Conclusion State-of-the-art Sequential and MapReduce based dimensionality reduction techniques involve the user for parameterisation ; Are not able to deal with the veracity aspect ; Are not able to deal with the data computational requirements ; Dr. Zaineb Chelly RoSTBiDFramework 13/24

Recommend


More recommend