Active Data Mining of Correspondence for Qualitative Assessment of - PowerPoint PPT Presentation

Active Data Mining of Correspondence for Qualitative Assessment of Scientific Computations Chris Bailey-Kellogg Purdue Computer Sciences http://www.cs.purdue.edu/homes/cbk/ Naren Ramakrishnan Virginia Tech Computer Science http://people.cs.vt.edu/~ramakris/

Data-Driven Characterization of Scientific Computations • Choice of solver depends on problem characteristics (e.g. matrix sensitivity) and algorithm performance (e.g. convergence). • Empirical characterization (rather than analytical) appropriate under imperfect domain knowledge, lack of theory. Low-level computational experiments �→ high-level properties. • Example: spectral portrait illustrates eigenstructure under perturbations of different magnitudes. Eigenvalues inside given curve 2.5 5 2 2 5 are indistinguishable under 1.5 4 5 3 6 1 6 7 0.5 perturbation of that magnitude. 7 8 8 9 2 8 10 9 8 6 10 10 0 5 9 7 9 6 4 8 7 −0.5 3 Suggests numerical precision 7 6 −1 6 5 −1.5 5 necessary. 2 −2 5 4 3 −2.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Active Data Mining with SAL Abstract Description • Spatial aggregation (bottom-up): uniform operators and data types for extracting multi-layer struc- Higher-Level Objects Redescribe Localize tures in spatial data. Equivalence classes • Ambiguity-directed sampling Classify Ambiguities (top-down): focus data collection Interpolate N-graph Sample Spatial Aggregate on difficult choice points. objects Lower-Level Objects • Underlying domain knowledge: Redescribe Localize continuity, locality. Input Field

Simple example: Input points (values not shown) Aggregate (localize computation) 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 −2 −1 0 1 2 3 4 5 6 7 −2 −1 0 1 2 3 4 5 6 7 Classify (group connected points with similar-enough value) 3 2 1 0 −1 −2 −3 −2 −1 0 1 2 3 4 5 6 7

Correspondence Extension to SAL • Key idea: identify mutually-reinforcing relationships among features of spatial objects, in order to combat noise and sparsity. • SAL particularly conducive: aggregation composes hierarchical spatial objects:

• Mechanism: 1. Establish analogy as relation among lower-level constituents of higher-level objects. 3 2 1 Ex: adjacent points of 0 neighboring iso-contours −1 −2 −3 −2 −1 0 1 2 3 4 5 6 7 2. Abstract lower-level analogy into higher-level correspondence. Ex: parameterized curve deformation. • Bridge lower-/higher-level gap: analogy’s meaning derived from higher-level context; abstraction enables computation of global properties (containment, breaks, overall quality). • Directly usable in ambiguity-directed sampling to address difficulties in correspondence.

Application 1: Matrix Spectral Portrait Analysis Spectral portrait for matrix A plots complex map: P ( z ) = log 10 �A� 2 � ( A − zI ) − 1 � 2 2.5 5 2 2 5 1.5 5 4 3 6 6 1 7 0.5 7 8 8 9 10 9 2 6 8 10 8 9 10 0 5 7 9 6 4 8 7 −0.5 3 7 6 −1 6 5 −1.5 5 2 −2 5 4 3 −2.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Singularities at eigenvalues; level curves capture “equivalent” eigenvalues wrt perturbations (i.e. curve k contains eigenvalues of all perturbed matrices A + E for E a matrix with � E � 2 ≤ k �A� 2 ). Perturbation-equivalence indicates sensitivity to numerical error. Ex: 2 & 3 most sensitive, then 4, then 1.

Correspondence-Based Merge Identification Approach: compute merge tree, indicating perturbation levels at which eigenvalues become indistinguishable, by finding correspondences among curves. 1. Sample perturbation levels on regular grid; interpolate iso-curves. 3 1 0.8 2 0.6 0.4 1 0.2 0 0 −0.2 −1 −0.4 −0.6 −2 −0.8 −1 −3 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −2 −1 0 1 2 3 4 5 6 7

2. Aggregate curve points in Delaunay triangulation. 3 1 0.8 2 0.6 0.4 1 0.2 0 0 −0.2 −1 −0.4 −0.6 −2 −0.8 −1 −3 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −2 −1 0 1 2 3 4 5 6 7 3. Analogy: cross-curve edges in triangulation. 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −0.2 −0.2 −0.4 −0.4 −0.6 −0.6 −0.8 −0.8 −1 −1 1.5 2 2.5 3 3.5 4 4.5 5 1.5 2 2.5 3 3.5 4 4.5

4. Correspondence abstraction: merge events in tree. 1 2 2 3 3 4 4 5 5 6 6 6 6 7 7 7 7 8 8 8 8 8 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 (4,0) (2,0) (3,0) (1,0) (4,0) (3,0) (2,0) (1,0) 5. Evaluate confidence in correspondence: fractions of points matched, angular separation between “separating” samples. 6. Sample to ensure curve locations adequately constrained by separating samples (so couldn’t have merged at smaller perturbation level).

Results • (2 n − 3)!! possible binary merge trees; most not explicitly considered (would be low confidence). • Initial grid: one sample between each eigenvalue, one unit larger than bounding box. • Subsample or expand grid when merge events poorly separated. • Tested on variety of polynomial companion matrices: different numbers / spacings of roots. • High-confidence model selection after 1–3 subsamples, 1–3 grid expansions. • Substantially less computation than “one-size-fits-all”; confidence metric and explainability.

Application 2: Matrix Jordan Form Jordan form analysis: • Input: matrix A of dimension n , r ≤ n independent eigenvectors with eigenvalues λ i of multiplicity ρ i . • Jordan decomposition: r upper triangular “blocks”     J 1 λ i 1     J 2 λ i 1     B − 1 AB = , J i =         · · 1         J r λ i • Typical algorithms numerically unstable.

Graphical Analysis of Jordan Form • Infer multiplicity by eigenvalue perturbations: iφ 1 ρi e λ i + | δ | ρi • Phase φ of perturbation δ ranges over multiples of π ⇒ computed values are vertices of regular 2 ρ i -gon, centered on λ i , with diameter from | δ | . • Ex: 8-by-8 Brunet matrix with structure ( − 1) 1 ( − 2) 1 (7) 3 (7) 3 , focusing on Jordan block for first (7) 3 : −3 x 10 1.5 1 0.5 0 −0.5 −1 −1.5 6.998 6.9985 6.999 6.9995 7 7.0005 7.001 7.0015 7.002

Correspondence-Based Symmetry Analysis Approach: compute Jordan structure by identifying portrait symmetry (i.e. auto-correspondence), abstracting as rotation by π/ρ around eigenvalue. 1. Sample points by random normwise perturbation at magnitude(s) of interest. 0.1 0.1 0 0 −0.1 −0.1 6.9 7 7.1 6.9 7 7.1 2. Aggregate triples into triangles.

3. Analogy among triangle vertices by congruence (computed via geometric hashing). 0.1 0.1 0 0 −0.1 −0.1 6.9 7 7.1 6.9 7 7.1 4. Correspondence as rotation ( x, y, θ ) overlaying vertices of congruent triangles. 0.1 0.1 0 0 −0.1 −0.1 6.9 7 7.1 6.9 7 7.1 Eigenvalue=(7.00,0.00); Eigenvalue=(7.00,0.00); rotation=60.46 ◦ ; ρ =3 rotation=60.13 ◦ ; ρ =3

5. Evaluate confidence in correspondence: distance between points and partners, regularity of sides of polygon. 6. Sample when entropy of models is high.

Results • 10 matrices; 4-10 perturbation levels; 6-8 samples each round. • Vary # models generated by varying congruence tolerance. • Three sample collection policies: 1. Collect at same level: 1.0–2.7 rounds 2. Collect at next higher level: better when #1 uses low level. 3. Collect at same level until begin “hallucinating”: better for Brunet-type matrices. • Symmetry quickly eliminates bad models. • No real advantage to varying perturbation level (independent estimates, irresp. of level). • Small amount of computation required for high-confidence assessment of Jordan form.

Some Related Work • F. Chaitin-Chatelin and V. Frayss´ e: graphical analysis of scientific computations (spectral portraits). • A. Edelman and Y. Ma: Jordan perturbation phenomena. • X. Huang and F. Zhao: correspondence in weather data iso-contours. • Lots of work in vision on computing and tracking correspondence. • D.A. Cohn, Z. Chahramani, and M.I. Jordan: active learning.

Discussion • Correspondence mechanism within Spatial Aggregation leverages hierarchical spatial objects and relationships. • First systematic algorithms for performing complete imagistic analyses (not relying on human visual inspection) of matrix eigenstructure. • Efficient, focused sampling and iterative model evaluation until high confidence obtained. • Overcome noise and sparsity by utilizing locality and continuity to identify mutually-reinforcing interpretations. • Many thanks to reviewers! • Funding: CBK (NSF IIS-0237654) and NR (NSF EIA-9974956, EIA-9984317, and EIA-0103660).

Active Data Mining of Correspondence for Qualitative Assessment of - PowerPoint PPT Presentation

Active Data Mining of Correspondence for Qualitative Assessment of Scientific Computations Chris Bailey-Kellogg Purdue Computer Sciences http://www.cs.purdue.edu/homes/cbk/ Naren Ramakrishnan Virginia Tech Computer Science

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Benefits and Challenges of Analyzing Qualitative Data Sheelagh Carpendale empirical research

Types of Correspondence Problems and Data Sets 1 1 Correspondence Registration 2

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

REVIEW OF QUALITATIVE RESEARCH AND PRINCIPLES OF QUALITATIVE ANALYSIS SCWK 242 SESSION 2

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Correspondence Management and Workflow Optimisation Workshop Your Facilitator is Nick Sharples

Business Correspondence Tone! Dr Bean ( ) at Business Correspondence Tone! Tone

package package ca function function ca mjca (simple) correspondence multiple

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Qualitative Research I. What is it? II. Conducting qualitative research: prep, sampling, data

Ecological models: A management tool of promising species with biomass potential in the Ecuadorian

Modeling brain cognitive functions by oscillatory neural networks Institute of Mathematical

Program an analysis workflow Day 1. Basic functionality of Chipster (Eija) Microarray

Multimedia Indexing and Retrieval Georges Qunot Multimedia Information Modeling and Retrieval

Completeness via correspondence for extensions of first degree entailment supplied with classical

Detecting Learners Profiles based on the Index of Learning Styles Data Silvia Rita Viola

The nature and impact of repeated migration within households in rural Ghana Eva-Maria Egger

On the correspondence between harmonic analysis and spectral theory Zhirayr Avetisyan UCL Maths