visual data mining for quantized spatial data
play

Visual Data Mining for Quantized, Spatial Data Amy Braverman Jet - PowerPoint PPT Presentation

Visual Data Mining for Quantized, Spatial Data Amy Braverman Jet Propulsion Laboratory California Institute of Technology Mail Stop 169-237 QuickTime and a Microsoft Video 1 decompressor are needed to see this picture. 4800 Oak Grove Drive


  1. Visual Data Mining for Quantized, Spatial Data Amy Braverman Jet Propulsion Laboratory California Institute of Technology Mail Stop 169-237 QuickTime™ and a Microsoft Video 1 decompressor are needed to see this picture. 4800 Oak Grove Drive Pasadena, CA 91109-8099 Amy.Braverman@jpl.nasa.gov

  2. Outline 1. Motivation. 2. Approach. 3. AIRS data collection. 4. Quantization. 5. Visual data mining (I). 6. Visual data mining (II). 7. Hierarchical Quantization. 8. Visual data mining (III). 9. Summary.

  3. Motivation • Earth Observing System satellites return “massive” data volume. • Traditional approach to data exploration: produce maps of one degree averages and standard deviations for each parameter of interest. • Good news: this is easy, practical, and everybody understands it. • Bad news: the method throws away almost all of the distributional information in the data including covariance and higher-order statistics. • Need: to “mine” the data, i.e. how do characteristics of joint distributions change in (time and space) and across resolutions? Characterize forcings and feedbacks.

  4. Approach • New approach: produce an estimate of the joint (empirical) probability distribution of variables of interest within each one degree grid cell. • Use a clustering algorithm such as K-means to partition data into groups, represent each group by its centroid and (normalized) membership count. • Collection of all 180 x 360 = 64,800 grid cell distribution estimates is a proxy for the original data. • How to find relationships? We need to visualize multivariate relationships while maintaining spatial context.

  5. AIRS Data Collection QuickTime™ and a YUV420 codec decompressor are needed to see this picture.

  6. Quantization N y 1 y k = 1 x 1 ∑ 1 N 1 x n 1[ x ∈ k ] N k x 2 y 2 n = 1 N 2 1 N ∑ N k = 1[ x ∈ k ] y K N K u l e s A I R S G r a n n = 1 x N Y = E ( X | Y ) (!) 1 X 2250 km 1 3 5 High-dimensional f o o t p r i n t s data space 1 degree lat/lon 1500 km 9 0 f o o t p r i n t s Geographic space

  7. Visual Data Mining (I) • Data: 11 AIRS channels observed over 3 days (July 20-22, 2002). • Compare joint distributions among grid cells: • Are the grid cell data homogeneous or heterogeneous? • What physical processes account for the shapes of the representatives and the distribution? • What physical processes might account for differences between grid cells? • Are there “outliers”?

  8. Visual Data Mining (II) • Data in this region: 10,498 clusters representing 60,681 observations. • Can we summarize the whole region as one?

  9. Hierarchical Quantization 2 δ ( X j , Y j ) = E X j − Y j 2 δ ( X , Y ) = E X − Y 2 δ ( Y , W ) = E Y − W Y W 4 ∑ δ ( X , W ) = δ ( Y , W ) + δ ( X j , Y j ) P ( V = j ) j = 1 W = E ( Y | W ) 4 ∑ Y = Y j I ( V = j ) Y 3 Y 4 j = 1 Y j = E ( X j | Y j ) Y 2 Y 1 N j P ( V = j ) = ∑ N j X 3 X 4 4 ∑ X = I ( V = j ) X j j = 1 X 2 X 1 1 degree 1 degree

  10. Visual Data Mining (III) More questions: • How do the distributions change as you move from east to west? (Suggested approach: subdivide the region into western half and eastern half. Summarize separately and compare to each other and summary of the whole. Subdivide again, etc.) North to south? • What other regions are similar to this one? Are they the ones we expect based on physics? Does spatial resolution matter for answering the question? If so, how? • Where are the regions of high complexity (variability or distribution entropy)? Do the physics support this? • How does the regression of channel 1 on channel 2 change spatially? • From where do the clusters come?

  11. Summary • Accept coarser spatial resolution (one degree) to achieve replication and estimate distributions. • Explore quantized data interactively by comparing distributions at different levels of aggregation and in different locations (and times). • We are mining the data, not making inferences. No spatial statistical models. • AIRS data will be available at http://daac.gsfc.nasa.gov/atmodyn/airs/index.html. • More information about AIRS: http://www-airs.jpl.nasa.gov .

Recommend


More recommend