http://www.bslhands4u.com/fingerspelling/4545036827 http://www.infoplease.com/ipa/A0200808.html Information Theory in Visual Analytics Min Chen Professor of Scientific Visualization including recent work in collaboration with Amos Golan, American University, USA min.chen@oerc.ox.ac.uk Oxford e-Research Centre The 41st CREST Open Workshop University of Oxford UCL, London, 27-28 April 2015
Three Visualization Subsystems raw data information geometry & labels image D I G V N N N Visual Source Filtering Rendering Mapping vis-encoder image optical signal optical signal image V S S' V' N N N Optical Displaying Viewing Transmission vis-channel image information knowledge V' I' K N N Perception Cognition Destination vis-decoder message signal signal message M S S' M' N Encoder Decoder Source Channel Destination (Transmitter) (Receiver) A General Communication System
Existing Uses of Information Theory Data processing View optimization Glyph design ... Theoretical framework Measuring visualization capacity, and related quantities Explaining phenomena in visualization processes Defining laws (mathematically-validated guidelines) Defining algorithm- or data-driven metrics Confirming the significance of visual analytics
Example: Visual Multiplexing 70 60 50 Location p can be associated with X spatial domain D Perceived information may include in the source data or determined by estimated values and relationships a spatial mapping. other signals and noise with data conveyed by other signals. c 3 c 4 c 3 c 2 c 1 at p MUX DEMUX information about at p c 2 c k c k p X can be a data record or a set of partially temporal domain T encoded visual attributes. vis-encoder vis-link (consisting of many vis-channels) vis-decoder M. Chen et al. , “Visual multiplexing ,” Computer Graphics Forum , 2014
Data Processing Inequality p ( x , y , z ) = p ( x ) p ( y | x ) p ( z | y ) “No clever manipulation of p ( y | x ) p ( z | y ) p ( x ) data can improve the Process 1 Process 2 inferences that can be X Y Z I ( X; Y ) I ( Y; Z ) made from the data” [Cover and Thomas, 2006] ( ; ) ( ; ) I X Y I X Z
Data Processing Inequality: Big Data Input? alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z 3 Z L-1 Z L Z L+1 Process Process Process Process Big Data ...... Decision 1 2 L-1 L entropy entropy H ( Z L+1 ) H ( Z 1 ) I ( Z 1 ; Z L+1 ) mutual information
DPI is not Ubiquitous p ( x , y , z ) = p ( x ) p ( y | x ) p ( z | y ) p ( y | x ) p ( z | y ) p ( x ) Process 1 Process 2 X Y Z Markov chain conditions I ( X; Y ) I ( Y; Z ) Closed coupling: (X, Y), (Y,Z) ( ; ) ( ; ) I X Y I X Z X and Z are conditionally independent What if one of the interaction U 1 interaction U 2 conditions is broken? Process 1 Process 2 X Y Z In visual analytics, both conditions are usually domain knowledge about X broken. Process 1 Process 2 X Y Z ( ; ) ( ; ) I X Y I X Z M. Chen and H. Jänicke , “An information-theoretic framework for visualization ,” IEEE Transactions on Visualisation and Computer Graphics , 2010
Soft Knowledge in Decision Space alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z 3 Z L-1 Z L Z L+1 Process Process Process Process Big Data ...... Decision 1 2 L -1 L All possible decisions under different conditions a) totally data-driven b) totally instinct-driven c) data-informed d d) due to unknown or uncontrollable factors b a entropy entropy c H ( X ) H ( Z 1 ) x X is a piece of soft knowledge I ( Z 1 ; X ) mutual information
An Example Data Analysis and Visualization Process r time series r decisions 720 data point each series 3 valid values each 2 32 valid value each point (e.g., buy, sell, hold) H max =420r H max =30r Z 3 Z 4 M H Time Series Feature Plots Recognition H max 1.58r H max =23040r H max =1920r r time series r time series Z 1 Z 2 60 data points 10 features Z 7 Aggregated Raw Data 128 valid values 8 valid values M H Data 1 hour long Decision at 5-second at 1-minute H max 15r(r-1) H max 1.16r(r-1) resolution resolution r time series r time series r decisions 720 data points 60 data points Z 5 Z 6 3 valid values 2 32 valid values 2 32 valid values M M Correlation Graph Indices Visualization machine human r ( r -1)/2 data points r ( r -1)/2 connections M H alphabet 2 30.7 valid values 5 valid values ... process process
A Sequential Workflow and Two Basic Metrics alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z s Z s+1 Z L Z L+1 Process Process Process Data ...... ...... Decision 1 s L The s th Function (Process): Alphabetic Compression Ratio (ACR): A Reverse “Guessing” Process: Potential Distortion Ratio (PDR): M. Chen and A. Golan, “What may visualization processes optimize?,” under review , 2015
Cost-Benefit Ratio alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z s Z s+1 Z L Z L+1 Process Process Process Data ...... ...... Decision 1 s L Effectual Compression Ratio (ECR): Incremental Cost-Benefit Ratio (ICBR): Cost can be measured in energy, time, money, etc. M. Chen and A. Golan, “What may visualization processes optimize?,” under review , 2015
Four Levels of Visualization Disseminative Level (This is A!) 1. A presentational aid for disseminating information or insight to others. The creator does not expect to gain much new knowledge. Observational / Operational Level (What, when, where?) 2. An operational aid that enables intuitive and/or speedily observation of captured data. Often part of routine operations. Confirmatory observation, anomaly detection., etc. Analytical Level (Does A relate to B? Why) 3. An investigative aid for examining and understanding complex relationships (e.g., correlation, causality, contradiction). Evaluating hypotheses, models, methods, algorithms and systems. Model-developmental Level (How does A lead to B?) 4. A developmental aid for improving existing models, methods, algorithms and systems, as well as the creation of new ones.
Levels 1, 2, 3 V D W 1 H V H Data V D : Disseminative Visualization V O : Observational Visualization V A : Analytical Visualization V O When will workflow W 3 work and W 2 V H V D when will not? W 3 M V D Model Visual mapping with V V A interaction (optional) Human perception, H W 4 M V H cognition, and action V D Predefined machine M processing Model Dynamically- M modifiable machine processing
Level 4 W 3 M Data V D When workflow W 3 does not Model work well, then ... V M : Model-developmental Visualization V M W 5 M V H V D Model V O Visual mapping with V interaction (optional) V M W 6 Human perception, H cognition, and action M V H V D Predefined machine M processing Dynamically- H Model M modifiable machine processing
Example: Level 1 minimal 64 pixels 7 6 5 4 3 2 1 0 256 byte 1 X X X X X X X X X X X X X X X X X X X X X Entropy of Data Alphabet X X X X X X X X X X X X X X 64 255 X X X X 1 1 X X X X X X X ( ) log 512 H Z byte 16 X X X 192 2 X X X X 256 256 X X X X X 0 0 t i X X X X X X X X X X X X X X X X X Binary Pixel Plot X X X X X X X X X minimal 256 pixels X X X X minimal 64 pixels X X X X X X X X X 4x4 pixels per bit 2 13 bits X X X X X X X X X byte 32 X X 128 X X X X X X X X X X X Time Series Plot X X X X X X X X X X X X X X X X X X Minimal 256x64 pixels ( 2 14 bits ) X X X X X X X X X X X X X X X X X X X X X X X X X X X X X byte 48 X X X X 64 X X X X X X X X X The more compact, the better? X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X byte 64 X X X X X 0 minimal 0 8 16 24 32 40 48 56 64 8 pixels
Example: Level 2 Real-time or offline annotation results in a huge spreadsheet of events Legg et al. , “ MatchPad: interactive glyph-Based visualization for real-time sports performance analysis,” Computer Graphics Forum , 2012
Example: Level 3 D. Oelke, D. Spretke, A. Stoffel, D. Keim , “Visual readability analysis: How to make your writings easier to read,” IEEE VAST , 2010
Example: Level 4 Expression Recognition Humans are very good at Machine vision is far behind Limited understanding Data Video Feature changes Time series Challenges A lot of features A lot of ways of measuring features Non-uniform temporal behavior Tam et al ., “Visualization of time-series data in parameter space for understanding facial dynamics,” Computer Graphics Forum , 2011
Parallel Coordinates Multi-dimensional data visualization Y Y X X
Interactive Visualization: Formulating Decisions
Telephone In the 1870s, Bell travelled around to give demos ‘in concert halls, where full orchestras and choruses played “America” and “Auld Lnag Syne into his gadgetry.’ Around 1880, Queen Victoria installed a pair of telephones at Winsor and Buckingham Palace Primary source: J. Gleick, book, 2012
Recommend
More recommend