I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering and Aggregation Models in Visual Analytics Khairi Reda | redak@iu.edu School of Informa5cs & Compu5ng, IUPUI
http://www.michelecoscia.com/wp-content/uploads/2012/08/demon2.png
Filtering & Aggrega1on • Too much data can overwhelm the visualiza5on • Some5mes we need to show less data points • Filter: eliminate irrelevant items • Aggregate: group similar items
Filter • Any func5on that par55ons the data into two set based on aGributes • Larger / smaller than X • Within a specified geographic extents • Noisy / significant readings • Filtering can also be apply to aGributes, as opposed to the data point themselves Based on a slide by Alex Lex
Filtering with Dynamic Queries Schneiderman
Filtering with menus
Scented Widgets • Provide cues (scent) to the users to aid in filtering and explora5on • Usually come in the form of small visual representa5ons that bind to interface elements Willett 2007, Via Alex Lex
Interac1ve Legends • Provides filtering controls from the legend Riche 2010, Via Alex Lex
Aggrega1on
Histogram • Aggregate items into bins • Display the number of items (i.e., frequency) in each bin
Histogram Number of bins can affect the shape of the histogram Distribution of passengers by Age 10 Bins 20 Bins Based on a slide by Alex Lex
Density plots http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/plotting_distributions.html
Box plots (aka Box-and- Whisker) plots • First quar5le: splits the lowest 25% of the data • Median: splits data into half • Third quar5le: splits the highest 25% of the data http://image.mathcaptain.com/cms/images/106/box-plot.png
Box plots (aka Box-and- Whisker) plots • An alterna5ve representa5on to the min/max is to scale the whiskers by the Interquar5le Range (Q3-Q1) Wikipedia
One box plot, four distribu1ons http://stat.mq.edu.au/wp-content/uploads/2014/05/Can_the_Box_Plot_be_Improved.pdf
Distribu1on, errors bars, and box plots Streit & Gehlenborg, PoV, Nature Methods, 2014 Via Alex Lex
Violin plots http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/plotting_distributions.html
Heatmaps • Aggregate 2D points into 2D bins
Heatmaps (for scaNerplots)
Spa1al Aggrega1on Changing the boundaries / structure of the aggrega5on bins yields different results Based on a slide by Alex Lex
Spa1al Aggrega1on Gerrymandering Based on a slide by Alex Lex
Clustering • Classifica5on of items into “similar” bins • Typically based on a similarity measure • Euclidean distance, Pearson correla5on, etc… • Many different clustering algorithms, with weaknesses and strengths • K-Means • Hierarchical clustering
K-Means • Pick K star5ng points as centroids. Those eventually will comprise the clusters • Calculate distance of every point to centroid, assigning the point to the closest centroid • Update the centroid to the average of the cluster’s members • Repeat
K-Means Limita5ons • Have to pick K • Assump5ons about the data: roughly “circular” clusters of equal size http://stats.stackexchange.com/questions/133656/how-to- understand-the-drawbacks-of-k-means
K-Means Limita5ons http://stats.stackexchange.com/questions/133656/how-to-understand-the-drawbacks-of-k-means
Dimensionality Reduc1on • High-dimensional data: large number of aGributes • Dimensionality reduc5on: Reduce number of dimensions (aGributes) while keeping as much varia5on as possible ANr ANr ANr ANr ANr ANr ANr ANr ANr ANr ANr Item … 1 2 3 4 5 6 7 8 9 10 11 A B C …
Dimensionality Reduc1on • Principle component analysis • Mul5dimensional scaling • And other techniques… ANr ANr ANr ANr ANr ANr ANr ANr ANr ANr ANr Item … 1 2 3 4 5 6 7 8 9 10 11 A B C …
Principle Component Analysis (PCA) • Find a new set of dimensions (axes) that explains the majority of the variance in the data • Order the new dimensions by variance • The first principle component accounts for most variance
Principle Component Analysis (PCA) http://setosa.io/ev/principal-component-analysis/
Mul1dimensional scaling (MDS) • Project the high-dimensional space onto a much lower space (e.g, 2D) • Relies on similarity between points (usually have to compute pairwise similarity between every pair of points) • Non-linear transforma5on: More difficult to interpret than PCA, but can maintain structures beGer in some cases
Models in Visual Analy1cs Adapted from: http://slideplayer.com/slide/4659134/ and from Remo Chang, 2010
Models in Visual Analy1cs • Abstrac5ons of how visualiza5on works: • Provide a way of talking about how humans interact with visualiza5ons • Language for describing different parts of the visual analy5c process • Every model is (overly) simplified: beware!
Terminology / Assump1ons • Sense making: The act of processing incomplete informa5on in order to improve one’s understanding of a situa5on and/or to make decisions • A person’s decision making is bound by [1] • incomplete informa5on • the amount of 5me they have to decide • the finite processing power of their brain • Mental model: An abstracted versions of the real- world that are more tractable [1] H. Simon 1957. “A Behavioral Model of Rational Choice”
Models in Visual Analy1cs
Informa1on Visualiza1on Reference Model Card, Mackinlay, and Scneiderman. Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann, 1999, pp. 17
Van Wijk’s Model D =Data V =Visualiza5on S =Specifica5on I =Image P =Percep5on K =Knowledge E =Explora5on Van Wijk, J. “The value of visualization”, 2005
Keim’s Visual Analy1cs Model Keim, D et al. “Visual Analytics: Definition, process, and challenges”, 2008
Pirolli and Card Sensemaking model Pirolli, P and Card, S. “The sense making process and leverage points for analyst technology as identified through cognitive task analysis”, 2005
Next week Class canceled — Happy Thanksgiving! Week 15 - Nov 30 Time series and temporal data Inference and uncertainty in visualiza5on
Recommend
More recommend