I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering - - PowerPoint PPT Presentation

i590 interactive visual analytics
SMART_READER_LITE
LIVE PREVIEW

I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering - - PowerPoint PPT Presentation

I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering and Aggregation Models in Visual Analytics Khairi Reda | redak@iu.edu School of Informa5cs & Compu5ng, IUPUI


  • I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering and Aggregation Models in Visual Analytics Khairi Reda | redak@iu.edu School of Informa5cs & Compu5ng, IUPUI

  • http://www.michelecoscia.com/wp-content/uploads/2012/08/demon2.png

  • Filtering & Aggrega1on • Too much data can overwhelm the visualiza5on • Some5mes we need to show less data points • Filter: eliminate irrelevant items • Aggregate: group similar items

  • Filter • Any func5on that par55ons the data into two set based on aGributes • Larger / smaller than X • Within a specified geographic extents • Noisy / significant readings • Filtering can also be apply to aGributes, as opposed to the data point themselves Based on a slide by Alex Lex

  • Filtering with Dynamic Queries Schneiderman

  • Filtering with menus

  • Scented Widgets • Provide cues (scent) to the users to aid in filtering and explora5on • Usually come in the form of small visual representa5ons that bind to interface elements Willett 2007, Via Alex Lex

  • Interac1ve Legends • Provides filtering controls from the legend Riche 2010, Via Alex Lex

  • Aggrega1on

  • Histogram • Aggregate items into bins • Display the number of items (i.e., frequency) in each bin

  • Histogram Number of bins can affect the shape of the histogram Distribution of passengers by Age 10 Bins 20 Bins Based on a slide by Alex Lex

  • Density plots http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/plotting_distributions.html

  • Box plots (aka Box-and- Whisker) plots • First quar5le: splits the lowest 25% of the data • Median: splits data into half • Third quar5le: splits the highest 25% of the data http://image.mathcaptain.com/cms/images/106/box-plot.png

  • Box plots (aka Box-and- Whisker) plots • An alterna5ve representa5on to the min/max is to scale the whiskers by the Interquar5le Range (Q3-Q1) Wikipedia

  • One box plot, four distribu1ons http://stat.mq.edu.au/wp-content/uploads/2014/05/Can_the_Box_Plot_be_Improved.pdf

  • Distribu1on, errors bars, and box plots Streit & Gehlenborg, PoV, Nature Methods, 2014 Via Alex Lex

  • Violin plots http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/plotting_distributions.html

  • Heatmaps • Aggregate 2D points into 2D bins

  • Heatmaps (for scaNerplots)

  • Spa1al Aggrega1on Changing the boundaries / structure of the aggrega5on bins yields different results Based on a slide by Alex Lex

  • Spa1al Aggrega1on Gerrymandering Based on a slide by Alex Lex

  • Clustering • Classifica5on of items into “similar” bins • Typically based on a similarity measure • Euclidean distance, Pearson correla5on, etc… • Many different clustering algorithms, with weaknesses and strengths • K-Means • Hierarchical clustering

  • K-Means • Pick K star5ng points as centroids. Those eventually will comprise the clusters • Calculate distance of every point to centroid, assigning the point to the closest centroid • Update the centroid to the average of the cluster’s members • Repeat

  • K-Means Limita5ons • Have to pick K • Assump5ons about the data: roughly “circular” clusters of equal size http://stats.stackexchange.com/questions/133656/how-to- understand-the-drawbacks-of-k-means

  • K-Means Limita5ons http://stats.stackexchange.com/questions/133656/how-to-understand-the-drawbacks-of-k-means

  • Dimensionality Reduc1on • High-dimensional data: large number of aGributes • Dimensionality reduc5on: Reduce number of dimensions (aGributes) while keeping as much varia5on as possible ANr ANr ANr ANr ANr ANr ANr ANr ANr ANr ANr Item … 1 2 3 4 5 6 7 8 9 10 11 A B C …

  • Dimensionality Reduc1on • Principle component analysis • Mul5dimensional scaling • And other techniques… ANr ANr ANr ANr ANr ANr ANr ANr ANr ANr ANr Item … 1 2 3 4 5 6 7 8 9 10 11 A B C …

  • Principle Component Analysis (PCA) • Find a new set of dimensions (axes) that explains the majority of the variance in the data • Order the new dimensions by variance • The first principle component accounts for most variance

  • Principle Component Analysis (PCA) http://setosa.io/ev/principal-component-analysis/

  • Mul1dimensional scaling (MDS) • Project the high-dimensional space onto a much lower space (e.g, 2D) • Relies on similarity between points (usually have to compute pairwise similarity between every pair of points) • Non-linear transforma5on: More difficult to interpret than PCA, but can maintain structures beGer in some cases

  • Models in Visual Analy1cs Adapted from: http://slideplayer.com/slide/4659134/ and from Remo Chang, 2010

  • Models in Visual Analy1cs • Abstrac5ons of how visualiza5on works: • Provide a way of talking about how humans interact with visualiza5ons • Language for describing different parts of the visual analy5c process • Every model is (overly) simplified: beware!

  • Terminology / Assump1ons • Sense making: The act of processing incomplete informa5on in order to improve one’s understanding of a situa5on and/or to make decisions • A person’s decision making is bound by [1] • incomplete informa5on • the amount of 5me they have to decide • the finite processing power of their brain • Mental model: An abstracted versions of the real- world that are more tractable [1] H. Simon 1957. “A Behavioral Model of Rational Choice”

  • Models in Visual Analy1cs

  • Informa1on Visualiza1on Reference Model Card, Mackinlay, and Scneiderman. Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann, 1999, pp. 17

  • Van Wijk’s Model D =Data V =Visualiza5on S =Specifica5on I =Image P =Percep5on K =Knowledge E =Explora5on Van Wijk, J. “The value of visualization”, 2005

  • Keim’s Visual Analy1cs Model Keim, D et al. “Visual Analytics: Definition, process, and challenges”, 2008

  • Pirolli and Card Sensemaking model Pirolli, P and Card, S. “The sense making process and leverage points for analyst technology as identified through cognitive task analysis”, 2005

  • Next week Class canceled — Happy Thanksgiving! Week 15 - Nov 30 Time series and temporal data Inference and uncertainty in visualiza5on