information theory in visual analytics min chen professor
play

Information Theory in Visual Analytics Min Chen Professor of - PowerPoint PPT Presentation

http://www.bslhands4u.com/fingerspelling/4545036827 http://www.infoplease.com/ipa/A0200808.html Information Theory in Visual Analytics Min Chen Professor of Scientific Visualization including recent work in collaboration with Amos Golan,


  1. http://www.bslhands4u.com/fingerspelling/4545036827 http://www.infoplease.com/ipa/A0200808.html Information Theory in Visual Analytics Min Chen Professor of Scientific Visualization including recent work in collaboration with Amos Golan, American University, USA min.chen@oerc.ox.ac.uk Oxford e-Research Centre The 41st CREST Open Workshop University of Oxford UCL, London, 27-28 April 2015

  2. Three Visualization Subsystems raw data information geometry & labels image D I G V N N N Visual Source Filtering Rendering Mapping vis-encoder image optical signal optical signal image V S S' V' N N N Optical Displaying Viewing Transmission vis-channel image information knowledge V' I' K N N Perception Cognition Destination vis-decoder message signal signal message M S S' M' N Encoder Decoder Source Channel Destination (Transmitter) (Receiver) A General Communication System

  3. Existing Uses of Information Theory  Data processing  View optimization  Glyph design  ...  Theoretical framework  Measuring visualization capacity, and related quantities  Explaining phenomena in visualization processes  Defining laws (mathematically-validated guidelines)  Defining algorithm- or data-driven metrics  Confirming the significance of visual analytics

  4. Example: Visual Multiplexing 70 60 50 Location p can be associated with X spatial domain D Perceived information may include in the source data or determined by estimated values and relationships a spatial mapping. other signals and noise with data conveyed by other signals. c 3 c 4 c 3 c 2 c 1 at p MUX DEMUX information about at p c 2 c k c k p X can be a data record or a set of partially temporal domain T encoded visual attributes. vis-encoder vis-link (consisting of many vis-channels) vis-decoder M. Chen et al. , “Visual multiplexing ,” Computer Graphics Forum , 2014

  5. Data Processing Inequality p ( x , y , z ) = p ( x ) p ( y | x ) p ( z | y )  “No clever manipulation of p ( y | x ) p ( z | y ) p ( x ) data can improve the Process 1 Process 2 inferences that can be X Y Z I ( X; Y ) I ( Y; Z ) made from the data” [Cover and Thomas, 2006]  ( ; ) ( ; ) I X Y I X Z

  6. Data Processing Inequality: Big Data Input? alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z 3 Z L-1 Z L Z L+1 Process Process Process Process Big Data ...... Decision 1 2 L-1 L entropy entropy H ( Z L+1 ) H ( Z 1 ) I ( Z 1 ; Z L+1 ) mutual information

  7. DPI is not Ubiquitous p ( x , y , z ) = p ( x ) p ( y | x ) p ( z | y ) p ( y | x ) p ( z | y ) p ( x ) Process 1 Process 2 X Y Z  Markov chain conditions I ( X; Y ) I ( Y; Z )  Closed coupling: (X, Y), (Y,Z)  ( ; ) ( ; ) I X Y I X Z  X and Z are conditionally independent  What if one of the interaction U 1 interaction U 2 conditions is broken? Process 1 Process 2 X Y Z  In visual analytics, both conditions are usually domain knowledge about X broken. Process 1 Process 2 X Y Z  ( ; ) ( ; ) I X Y I X Z M. Chen and H. Jänicke , “An information-theoretic framework for visualization ,” IEEE Transactions on Visualisation and Computer Graphics , 2010

  8. Soft Knowledge in Decision Space alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z 3 Z L-1 Z L Z L+1 Process Process Process Process Big Data ...... Decision 1 2 L -1 L All possible decisions under different conditions a) totally data-driven b) totally instinct-driven c) data-informed d d) due to unknown or uncontrollable factors b a entropy entropy c H ( X ) H ( Z 1 ) x  X is a piece of soft knowledge I ( Z 1 ; X ) mutual information

  9. An Example Data Analysis and Visualization Process  r time series  r decisions  720 data point each series  3 valid values each  2 32 valid value each point (e.g., buy, sell, hold) H max =420r H max =30r Z 3 Z 4 M H Time Series Feature Plots Recognition H max  1.58r H max =23040r H max =1920r r time series r time series Z 1 Z 2  60 data points  10 features Z 7 Aggregated Raw Data  128 valid values  8 valid values M H Data 1 hour long Decision at 5-second at 1-minute H max  15r(r-1) H max  1.16r(r-1) resolution resolution r time series r time series r decisions  720 data points  60 data points Z 5 Z 6  3 valid values  2 32 valid values  2 32 valid values M M Correlation Graph Indices Visualization machine human r ( r -1)/2 data points r ( r -1)/2 connections M H alphabet  2 30.7 valid values  5 valid values ... process process

  10. A Sequential Workflow and Two Basic Metrics alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z s Z s+1 Z L Z L+1 Process Process Process Data ...... ...... Decision 1 s L  The s th Function (Process):  Alphabetic Compression Ratio (ACR):  A Reverse “Guessing” Process:  Potential Distortion Ratio (PDR): M. Chen and A. Golan, “What may visualization processes optimize?,” under review , 2015

  11. Cost-Benefit Ratio alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z s Z s+1 Z L Z L+1 Process Process Process Data ...... ...... Decision 1 s L  Effectual Compression Ratio (ECR):  Incremental Cost-Benefit Ratio (ICBR):  Cost can be measured in energy, time, money, etc. M. Chen and A. Golan, “What may visualization processes optimize?,” under review , 2015

  12. Four Levels of Visualization Disseminative Level (This is A!) 1.  A presentational aid for disseminating information or insight to others.  The creator does not expect to gain much new knowledge. Observational / Operational Level (What, when, where?) 2. An operational aid that enables intuitive and/or speedily observation  of captured data. Often part of routine operations. Confirmatory observation, anomaly detection., etc.  Analytical Level (Does A relate to B? Why) 3.  An investigative aid for examining and understanding complex relationships (e.g., correlation, causality, contradiction).  Evaluating hypotheses, models, methods, algorithms and systems. Model-developmental Level (How does A lead to B?) 4.  A developmental aid for improving existing models, methods, algorithms and systems, as well as the creation of new ones.

  13. Levels 1, 2, 3 V D W 1 H V H Data V D : Disseminative Visualization  V O : Observational Visualization  V A : Analytical Visualization  V O When will workflow W 3 work and  W 2 V H V D when will not? W 3 M V D Model Visual mapping with V V A interaction (optional) Human perception, H W 4 M V H cognition, and action V D Predefined machine M processing Model Dynamically- M modifiable machine processing

  14. Level 4 W 3 M Data V D When workflow W 3 does not  Model work well, then ... V M : Model-developmental  Visualization V M W 5 M V H V D Model V O Visual mapping with V interaction (optional) V M W 6 Human perception, H cognition, and action M V H V D Predefined machine M processing Dynamically- H Model M modifiable machine processing

  15. Example: Level 1 minimal 64 pixels 7 6 5 4 3 2 1 0 256 byte 1 X X X X X X X X X X X X X X X X X X X X X  Entropy of Data Alphabet X X X X X X X X X X X X X X 64 255 X X X X 1 1   X X X X X   X X ( ) log 512 H Z byte 16 X X X 192 2 X X X X 256 256 X X   X X X 0 0 t i X X X X X X X X X X X X X X X X X  Binary Pixel Plot X X X X X X X X X minimal 256 pixels X X X X minimal 64 pixels X X X X X X X X X  4x4 pixels per bit  2 13 bits X X X X X X X X X byte 32 X X 128 X X X X X X X X X X X  Time Series Plot X X X X X X X X X X X X X X X X X X  Minimal 256x64 pixels ( 2 14 bits ) X X X X X X X X X X X X X X X X X X X X X X X X X X X X X byte 48 X X X X 64 X X X X X X X X X  The more compact, the better? X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X byte 64 X X X X X 0 minimal 0 8 16 24 32 40 48 56 64 8 pixels

  16. Example: Level 2  Real-time or offline annotation results in a huge spreadsheet of events Legg et al. , “ MatchPad: interactive glyph-Based visualization for real-time sports performance analysis,” Computer Graphics Forum , 2012

  17. Example: Level 3 D. Oelke, D. Spretke, A. Stoffel, D. Keim , “Visual readability analysis: How to make your writings easier to read,” IEEE VAST , 2010

  18. Example: Level 4  Expression Recognition  Humans are very good at  Machine vision is far behind  Limited understanding  Data  Video  Feature changes  Time series  Challenges  A lot of features  A lot of ways of measuring features  Non-uniform temporal behavior Tam et al ., “Visualization of time-series data in parameter space for understanding facial dynamics,” Computer Graphics Forum , 2011

  19. Parallel Coordinates  Multi-dimensional data visualization Y Y X X

  20. Interactive Visualization: Formulating Decisions

  21. Telephone  In the 1870s, Bell travelled around to give demos ‘in concert halls, where full orchestras and choruses played “America” and “Auld Lnag Syne into his gadgetry.’  Around 1880, Queen Victoria installed a pair of telephones at Winsor and Buckingham Palace Primary source: J. Gleick, book, 2012

Recommend


More recommend