gaussian process models of spatial aggregation algorithms
play

Gaussian Process Models of Spatial Aggregation Algorithms Naren - PowerPoint PPT Presentation

Gaussian Process Models of Spatial Aggregation Algorithms Naren Ramakrishnan Virginia Tech Computer Science http://people.cs.vt.edu/~ramakris/ Chris Bailey-Kellogg Purdue Computer Sciences http://www.cs.purdue.edu/homes/cbk/ Big Picture


  1. Gaussian Process Models of Spatial Aggregation Algorithms Naren Ramakrishnan Virginia Tech Computer Science http://people.cs.vt.edu/~ramakris/ Chris Bailey-Kellogg Purdue Computer Sciences http://www.cs.purdue.edu/homes/cbk/

  2. Big Picture Spatial Aggregation: generic Abstract Description mechanism for spatial data Higher-Level Objects Redescribe Localize mining, parameterized by Equivalence classes Classify domain knowledge. Ambiguities Interpolate N-graph Sample Spatial Aggregate objects Lower-Level Objects Redescribe Localize Input Field Gaussian Processes: generic framework for spatial statistical modeling, parameterized by covariance structure. SA+GP: model the mining mechanism for meta-level reasoning, e.g. targeting samples and characterizing sensitivity to parameters and inputs.

  3. Example: Wireless System Configuration Optimize performance (e.g. signal-to-noise, bit error probability) of wireless system configuration (e.g. distance between antennae). Simulate across range of 40 configurations (hours to days per 30 SNR2, dB 20 simulation). 10 10 20 30 40 SNR1, dB Aggregate structures in In shaded region, 99% confidence configuration space. that average error is acceptable. Analyze structures to Configs in upper right less characterize performance. sensitive to power imbalance (region width).

  4. General Features Problem: scarce spatial data mining in physical domains • Expensive data collection. Much implicit but little explicit data. • Control over data collection. • Available physical knowledge — continuity, locality, symmetry, etc. Approach: multi-level qualitative analysis • Exploit domain knowledge to uncover qualitative structures in data. • Sample optimization driven by model selection — maximize expected information gain, minimize expense, . . . . • Decisions explainable in terms of problem structures & physical knowledge.

  5. Mining Mechanism: Spatial Aggregation (SA) Local operations for finding multi-level structures in spatial data. • Input: numerical field . Ex: weather maps, numerical Abstract Description simulation output. • Output: high-level description of Higher-Level Objects structure, behavior, and design. Redescribe Localize Ex: fronts, stability regions in dy- Equivalence classes Classify namical systems. Ambiguities Interpolate N-graph Sample • Bridge quantitative ↔ qualita- Spatial Aggregate objects tive via increasingly abstract Lower-Level Objects Redescribe Localize structural descriptions. Input Field • Key domain knowledge: locality in domain, similarity in feature.

  6. Spatial Aggregation Example Goal: find flows in vector field (e.g. wind velocity, temp. gradient). (a) Input (b) Localize (distance < r ) (c) Test similarity (angle < θ ) (d) Select succ ( d · distance + angle)

  7. (e) Select pred ( d · distance + angle) (f) Redescribe (points �→ curve) (g) Bundle curves by higher-level locality, similarity

  8. Reasoning About SA Applications • Sensitivity to input? • Sensitivity to parameters (locality, similarity metrics)? • Optimization of additional samples? Approach: probabilistic model of spatial relationships, in terms of Gaussian Processes. Abstract Description Higher-Level Objects Redescribe Localize ↔ Equivalence classes Classify Ambiguities Interpolate N-graph Sample Spatial Aggregate objects Lower-Level Objects Redescribe Localize Input Field

  9. Gaussian Processes: Intuition • 1D version of vector flow analysis: 3 2 1 gradient 0 −1 −2 −3 0 2 4 6 8 10 12 14 16 18 20 x Qualitative structure: same-direction flow. • Regression: given angles at some sample points, predict at new, unobserved points. 3 2 vector angle (values or distributions): radians 1 0 −1 −2 −3 0 2 4 6 8 10 12 14 16 18 20 x Gaussian conditional distribution; covariance structure captures locality.

  10. • Classification: apply logistic (higher-D: softmax) function to estimate latent variable representing class: 1 0.9 0.9 0.8 0.8 0.7 3 0.7 �→ 2 � 0.6 0.6 1 gradient 0 0.5 0.5 −1 −2 0.4 0.4 −3 0 2 4 6 8 10 12 14 16 18 20 0.3 x 0.3 0.2 0.2 0.1 0 0.1 −10 −8 −6 −4 −2 0 2 4 6 8 10 0 2 4 6 8 10 12 14 16 18 20

  11. GP as Spatial Interpolation (Kriging) • Given set of observations { ( x 1 , y 1 ) , . . . , ( x k , y k ) } (vector angles at positions), want to model y = f ( x ). • Possible form f ( x ) = α + Z ( x ). • Model Z with Gaussian: mean 0, covariance σ 2 R . • Key: structure of R captures neighborhood relationships among samples. 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 Ex: R ( x i , x j ) = e − ρ | x i − x j | 2 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 −10 −8 −6 −4 −2 0 2 4 6 8 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 ρ = 0 . 1 ρ = 1 Note: exact interpolation at data points.

  12. • Optimize parameters given observations, to estimate f ′ . Ex: minimize mean squared error E { ( f ′ − f ) 2 } : � − k � 2(ln σ 2 + ln | R | ) max ρ where R is k × k symmetric correlation matrix from R . • One-D optimization straightforward; higher-D requires MCMC. • Once optimized, prediction for x k +1 is easy, based on correlation to samples: α + r T ( x k +1 ) R − 1 ( y − ˆ f ′ ( x k +1 ) = ˆ α I k ) r is correlation vector for x k +1 vs. sample points. α = ( I T k R − 1 I k ) − 1 I T k R − 1 y . α estimates α : ˆ ˆ Then the estimate’s variance is α I k ) T R − 1 ( y − ˆ σ 2 = ( y − ˆ α I k ) ˆ k

  13. Gaussian Processes in General 6 5 4 3 2 1 0 −1 −2 −3 −2 −1 0 1 2 3 Keys: • Bayesian modeling, with prior directly on function space. • Generalize Gaussian distribution over finite vectors to one over functions, using mean and covariance functions. • Fully specified by distributions on finite sample sets, so still only perform nice matrix operations.

  14. Related Work: • Rasmussen: unifying framework for multivariate regression. • Williams and Barber: classification. • MacKay: pattern recognition. • Neal: model for neural networks. • Sacks: model deterministic computer experiments with stochastic processes.

  15. Multi-Layer GP • SAL programs repeatedly aggregate/classify/redescribe, up an abstraction hierarchy. �→ sequence of GP models, each with covariance; superpose for composite. • Input data field: interpolated surrogate for sparse samples. • Locality (neighborhood graph — “close enough”) modeled by n e − ρ i | x ( k ) − x ( l ) | η � R ( x ( k ) , x ( l ) ) = ζ i i i =1 • Similarity in feature (equivalence predicate — “good-direction flow”) only applicable when combined with locality. ⇒ Combined hyperparameters for position and direction. Hierarchical prior allows for determination of relative importance.

  16. Case Study: Pocket Identification Abstract wireless problem with de Boor “pocket” function. � n �� � x i � 2 i α ( X ) = cos 1 + − 2 | x i | i =1 δ ( X ) = � X − 0 . 5 I � α ( X )(1 − δ 2 ( X )(3 − 2 δ ( X ))) + 1 p ( X ) = 1 0.5 0 −0.5 −1 −1.5 −2 1 0.5 1 0.5 0 0 −0.5 −0.5 −1 −1 Goal: identify number & locations of pockets (not func. approx.), with minimal # samples.

  17. SAL Pocket Finding 1 0.5 0 −0.5 −1 −1 −0.5 0 0.5 1

  18. Test Vary parameters (close-enough wrt r , similar-enough angle wrt θ , weight d for combining distance and angle): √ √ ∈ { 1 , 3 , 2 } r 2 , 1 . 5 , ∈ { 0 . 7 , 0 . 8 , 0 . 85 , 0 . 9 , 0 . 95 } θ ∈ { 0 . 01 , 0 . 02 , 0 . 03 , 0 . 04 , 0 . 05 } d Construct GP (i.e. estimate covariance terms) for flow classes using Neal’s software, hybrid MC.

  19. Number of Pockets • d had little effect in this field, due to symmetry. • Averaged over d , at varying ( r, θ ): 140 0.70 0.80 120 0.85 0.90 100 0.95 # pockets 80 60 40 20 0 1 1.414 1.5 1.732 2 • Abrupt jump at θ = 0 . 95 — stringent vector similarity.

  20. Covariance Contributions 5 5 0.70 0.70 0.80 0.80 0.85 0.85 4 4 0.90 0.90 0.95 0.95 covar contrib ρ x covar contrib ρ y 3 3 2 2 1 1 0 0 1 1.414 1.5 1.732 2 1 1.414 1.5 1.732 2 • Basically symmetric. • Increase quadratically with # pockets — can’t stray “too far” for prediction. • Characteristic length, 1 /ρ , decreases with # pockets — identified pockets occupy less of the space.

  21. Discussion • Model qualitative spatial data mining with stochastic process framework, summarizing transformation from input to high-level abstractions. • Probabilistic basis allows sample optimization, studies of parameter sensitivity, reasoning about algorithm applicability. • Next steps: combined modeling of sensitivity to input and parameters. • Thanks to Feng Zhao (PARC), Layne T. Watson (Va. Tech). • Funding: NR (NSF EIA-9974956, EIA-9984317, and EIA-0103660) and CBK (NSF IIS-0237654).

Recommend


More recommend