high performance outlier detection algorithm for finding
play

High-Performance Outlier Detection Algorithm for Finding - PowerPoint PPT Presentation

High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma Lingfei Wu 1 , Kesheng Wu 2 , Alex Sim 2 , Michael Churchill 3 , Jong Y. Choi 4 , Andreas Stathopoulos 1 , CS Chang 3 , and Scott Klasky 4 1 College of William and


  1. High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma Lingfei Wu 1 , Kesheng Wu 2 , Alex Sim 2 , Michael Churchill 3 , Jong Y. Choi 4 , Andreas Stathopoulos 1 , CS Chang 3 , and Scott Klasky 4 1 College of William and Mary 2 Lawrence Berkeley National laboratory 3 Princeton Plasma Physics Laboratory 4 Oak Ridge National Laboratory BDAC-SC14 1 / 17

  2. Outline • Outline Introduction Introduction Related work Related work Blob detection Blob detection Hybrid parallel Hybrid parallel Evaluations Conclusion Evaluations Conclusion BDAC-SC14 2 / 17

  3. What is an outlier ? • Outline An outlier is a data object that deviates significantly from the Introduction rest of the objects, as if it were generated by a different • Outlier Detection mechanism. 1 • Our goal • Blobs in fusion • Motivation • Outliers could be errors or noise to be eliminated Related work • Outliers can lead to the discovery of important information in data Blob detection Hybrid parallel Outlier detection is employed in a variety of applications: Evaluations • Conclusion fraud detection • time-series monitoring • medical care • public safety and security 1 Jiawei Han and Micheline Kamber, Data Mining, Southeast Asia Edition: Concepts and Techniques , Morgan kaufmann, 2006. BDAC-SC14 3 / 17

  4. Our goal • Outline Outlier detection is an important task in many safety critical Introduction environments. • Outlier Detection • Our goal • An outlier demands to be detected in real-time • Blobs in fusion • Motivation • A suitable feedback is provided to alarm the control system Related work • The size of data sets need fast and scalable outlier detection Blob detection methods Hybrid parallel Evaluations Our goal: apply the outlier detection techniques to effectively Conclusion tackle the fusion blob detection problem on extremely large parallel machines • Massive amounts of data are generated from fusion experiments / simulations • Near real-time understanding of data is needed to predict performance BDAC-SC14 4 / 17

  5. Blobs in fusion • Outline What is fusion & Why fusion? Introduction • Outlier Detection • Fusion is viable energy • Our goal • Blobs in fusion source for the future • Motivation • Fossil fuels will run out Related work Blob detection soon; Solar and wind have Hybrid parallel limited potential Evaluations • Advantages of fusion: Conclusion inexhaustible, clear and safe BDAC-SC14 5 / 17

  6. Blobs in fusion • Outline Blobs are intermittent bursts of particles near the edge of Introduction the confined plasma • Outlier Detection ⇒ Driven by turbulence • Our goal • Blobs in fusion • Motivation Blobs are bad for fusion performance Related work Blob detection because they: Hybrid parallel • Transport heat and particles away from Evaluations the confined plasma Conclusion • May damage the main chamber wall • Lead to increased levels of neutrals and impurities, bypassing control mechanisms Blob detection is a very important task! BDAC-SC14 5 / 17

  7. Big data challenges in fusion energy • Outline Fusion experiments generate massive amounts of data: Introduction • Outlier Detection • Diagnostics measuring lasts • Our goal • Blobs in fusion from a few to several hundred • Motivation seconds generating large Related work Blob detection amounts of data, ∼ Gigabytes Hybrid parallel to Terabytes! Evaluations • Large-scale fusion simulation Conclusion generates ∼ a few tens of Terabytes per second! BDAC-SC14 6 / 17

  8. Big data challenges in fusion energy • Outline Difficulties in large-scale data analysis: Introduction • Outlier Detection • Existing data analysis is often • Our goal • Blobs in fusion a single-threaded, slow, and • Motivation only for post-run analysis Related work Blob detection • Fusion experiments demand Hybrid parallel real-time data analysis Evaluations • E.g. ICEE aims to apply blob Conclusion detection for monitoring health of fusion experiments in KSTAR Real-time blob detection is a very challenging task! BDAC-SC14 6 / 17

  9. Three approaches for blob detection • Outline • The exact criterion varies Single Introduction • Averaging may destroy important threshold & Related work • Related work conditional information Blob detection averaging Hybrid parallel Evaluations • Image Very sensitive to the setting of Conclusion analysis parameters • techniques Hard to use generic method for all images • Contouring Can not be a real-time blob detection • method & May miss detecting blobs at the edge • thresholding Is still post-run-analysis BDAC-SC14 7 / 17

  10. An efficient blob detection approach • Outline • Our approach: an outlier detection algorithm for efficiently Introduction finding blobs in fusion simulations / experiments Related work ◦ Two-step outlier detection with various criteria after Blob detection • Our approach normalizing the local intensity • The sketch • Refine mesh ◦ Leverage a fast connected component labeling method to • Two-step detection find blob components based on a refined triangular mesh • Fast CCL Hybrid parallel • Contributions: Evaluations ◦ Conclusion A new method not missing detection of blobs in the edge of the region of interests compared to contouring method ◦ Targeting for more challenging in-shot-analysis and between-shot-analysis ◦ The first research work to achieve blob detection in a few milliseconds BDAC-SC14 8 / 17

  11. Outlier detection algorithm for finding blobs • Outline Sketch the proposed outlier detection algorithm: Introduction Related work Blob detection • Our approach • The sketch • Refine mesh • Two-step detection • Fast CCL Hybrid parallel Evaluations Conclusion BDAC-SC14 9 / 17

  12. Refine mesh in the region of interests • Outline Magnetic Fields in Poloidal Plane Introduction Poloidal Plane Reinfed Region of Interests 0.25 Original Related work 1 0.2 Blob detection 0.15 0.5 0.1 • Our approach 0.05 • The sketch Z (m) Z (m) 0 0 • Refine mesh -0.05 • Two-step detection -0.5 -0.1 • Fast CCL -0.15 -1 Hybrid parallel -0.2 -0.25 1.2 1.4 1.6 1.8 2 2.2 2.4 2.25 2.26 2.27 2.28 2.29 2.3 2.31 2.32 Evaluations R (m) R (m) Conclusion • Compute 4 times more triangles by creating new vertexes with the three middle points of original edges • Apply recursively until reaching the desired resolution • Depend on specified data set and demanded resolution BDAC-SC14 10 / 17

  13. Two-step outlier detection to identify blobs • Outline Motivation for two-step outlier detection for finding blobs: Introduction Related work Blob detection • Our approach • The sketch • Refine mesh • Two-step detection • Fast CCL Hybrid parallel Evaluations Conclusion A contour plot in the region of interests BDAC-SC14 11 / 17

  14. Two-step outlier detection to identify blobs • Outline Apply exploratory data analysis to analyze the underlying Introduction distribution of the local normalized density: Related work 4 Density distribution fitting using 50 bins 4 Density distribution fitting using 50 bins 7 x 10 7 x 10 Blob detection • Our approach 6 6 • The sketch Number of points in each bin Number of points in each bin 5 5 • Refine mesh • Two-step detection 4 4 • Fast CCL 3 3 Hybrid parallel 2 2 Evaluations 1 1 Conclusion 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Normalized electron density (n_e/n_e0) Normalized electron density (n_e/n_e0) (a) Extreme Value Distribution (b) Log Normal Distribution � � N ( r i , z i , t ) − µ > α ∗ σ, ∀ ( r i , z i ) ∈ Γ , N ( r i , z i , t ) − µ 2 > β ∗ σ 2 , ∀ ( r i , z i ) ∈ Γ 2 . BDAC-SC14 11 / 17

  15. A fast connected component labeling algorithm • Outline We apply an efficient connected component labeling algorithm Introduction on a refined triangular mesh to find blob components: Related work • This is a two-pass approach and each triangle is scanned firstly Blob detection • Our approach • Reduce unnecessary memory access if any vertex in a triangle • The sketch • Refine mesh is found to be connected with others • Two-step detection • After the label array is filled full, we need flatten the union and • Fast CCL Hybrid parallel find tree Evaluations • Second pass is performed to correct labels and all blob Conclusion candidate components are found BDAC-SC14 12 / 17

  16. Parallelization of blob detection approach • Outline A hybrid MPI/OpenMP parallelization on many-core processor Introduction architecture: Related work • High-level: use MPI to allocate n processes to process each Blob detection time frame Hybrid parallel • MPI/OpenMP • Low-level: use OpenMP to accelerate the computations with m Evaluations threads Conclusion BDAC-SC14 13 / 17

  17. Results: same time frame + four planes • Outline Introduction Related work Blob detection Hybrid parallel Evaluations • Results I • Results II • Results III Conclusion BDAC-SC14 14 / 17

  18. Results: same plane + four time frames • Outline Introduction Related work Blob detection Hybrid parallel Evaluations • Results I • Results II • Results III Conclusion BDAC-SC14 15 / 17

Recommend


More recommend