Visual Analytics and Information Retrieval Giuseppe Santucci Dipartimento di Informatica e Sistemistica Sapienza Università di Roma santucci@dis.uniroma1.it
Who am I? (University of Rome is so big…) • VisDis and the Database & User Interface groups are two tightly connected research groups at the Department of Computer and System Science (32 full professors, 19 associate ,and 13 assistant professors) of Rome Faculty of Engineering & ICT ? • The VisDis and the Database/Interface group background is about: – Visual Information Access – Data quality – Data integration – User Centered Design – Usability and Accessibility – Infovis evaluation – Visual quality metrics – Visual Analytics • Data sampling • Density map optimization – Information Retrieval (&VA) Fire 2012, Kolkata VA & IR - Giuseppe Santucci 2 19 December 2012
Outline • Information Visualization – Definitions – Main issues • Data overloading – Visual Analytics – Visual Analytics challenges • One methodological examples • VA and Information Retrieval • Demo Fire 2012, Kolkata VA & IR - Giuseppe Santucci 3 19 December 2012
Information Visualization? • Old stuff… Fire 2012, Kolkata VA & IR - Giuseppe Santucci 4 19 December 2012
Visualization for Problem Solving • Mystery: what is causing a cholera epidemic in London in 1854? Fire 2012, Kolkata VA & IR - Giuseppe Santucci 5 19 December 2012
Visualization for Problem Solving Illustration of Dr. John Snow (1854) Dots indicate location of deaths X indicate the location of water pumps [From Visual Explanations by Edward Tufte, Graphics Press, 1997] Fire 2012, Kolkata VA & IR - Giuseppe Santucci 6 19 December 2012
Visualization for Problem Solving The actual John Snow pub in London close to the water pump !!! B.T.W., workers at the nearby brewery were Dr. Snow deducted that the cholera epidemic noted to be relatively was caused by a contaminated water pump !!! free of cholera… Closing that pump quickly solved the problem 7
Visualization for Explaining What happened during the Napoleon’s Russian Campaign? Fire 2012, Kolkata VA & IR - Giuseppe Santucci 8 19 December 2012
The Charles Joseph Minard’s map (1861) Fire 2012, Kolkata VA & IR - Giuseppe Santucci 9 19 December 2012
Visualization for Making decision Traveling in London by underground How can I get Queens Park from Victoria station? Fire 2012, Kolkata VA & IR - Giuseppe Santucci 10 19 December 2012
London Underground Map 1927 Fire 2012, Kolkata VA & IR - Giuseppe Santucci 11 19 December 2012
The Harry Beck’s idea • Real position (when traveling in underground) does not matter • Only station sequences matter together with their connections • Beck proposed a “distorted” map • Actually all the underground maps in the world follow the Beck’s approach • He got a little payment (London underground was not sure about the idea) • Still true right now: infovis people do not become rich… • Likely that holds for VA and IR as well � Fire 2012, Kolkata VA & IR - Giuseppe Santucci 12 19 December 2012
London Underground Map 1990s Fire 2012, Kolkata VA & IR - Giuseppe Santucci 13 19 December 2012
Moving to the present time • What is modern Information Visualization ? • First of all, what is Visualization ? • Visualize: to form a mental model or mental image of something • It is a cognitive activity and it has nothing to do with computers Fire 2012, Kolkata VA & IR - Giuseppe Santucci 14 19 December 2012
What is Information Visualization? Information visualization is the use of computer- supported , interactive , visual representations of abstract data to amplify cognition . [Card et al. ‘99] Fire 2012, Kolkata VA & IR - Giuseppe Santucci 15 19 December 2012
Information visualization ! 1. Infovis is perfect for exploration, when we don’t know exactly what to look at. It supports vague goals 2. Infovis is perfect to explain complex data and to support decisions • Other approaches to data analysis – Statistics: strong verification but does not support exploration and vague goals – Data mining: actionable and reliable but black box, not interactive, question-response style – Visual Analytics (formerly Visual Data Mining) is trying to join the two worlds
…computer supported and interactive • Computer-supported – Yes we use computers, but we have to always remember that a cognitive activity is involved in the process • Interactive – To exploit the full power of Infovis techniques interaction is mandatory. Fire 2012, Kolkata VA & IR - Giuseppe Santucci 17 19 December 2012
Interaction example • Agronomists are experimenting 7 treatments (anti-parasite, fertilizer, etc.) on 10 different crops (corn, tomatoes, etc.) • A black square indicates success Treatments • Does this visualization help? A B C D E F G 1 Re 2 3 4 Crops 5 6 7 8 9 10 Fire 2012, Kolkata VA & IR - Giuseppe Santucci 18 19 December 2012
Interaction example • Let’s rearrange the rows Treatments Treatments A B C D E F G A D C E G B F 1 1 Rearrange 2 3 3 8 4 2 Crops Crops 5 6 6 10 7 4 8 7 (10! � , VA can help…) 9 9 10 5 Fire 2012, Kolkata VA & IR - Giuseppe Santucci 19 19 December 2012
…it is about abstract data • Abstract data – Information visualization deals with images that does not refer to physical situation . In other words it is NOT scientific visualization/geographic visualization • Scientific visualization primarily relates to and represents something physical or geometric • Examples – Air flow over a wing – Weather over USA – Torrents inside a tornado – Organs in the human body – Molecular bonding… Fire 2012, Kolkata VA & IR - Giuseppe Santucci 20 19 December 2012
Scientific/geographic visualization Earthquake intensity Fire 2012, Kolkata VA & IR - Giuseppe Santucci 21 19 December 2012
…abstract data • Items that do not have a direct physical/visual correspondence • Examples: sport statistics, stock trends, query results, software data, IR metrics, etc… • Items are represented on a 2D / 3D physical space using their numerical characteristics (attributes) • The visualization is useful for analysis and decision-making (not just for fun or colors) • E.g. : Postal parcels – Shipping date – Volume – Weight – Sender country – Receiver country – … Fire 2012, Kolkata VA & IR - Giuseppe Santucci 22 19 December 2012
Abstract data A 2D Scatterplot showing about 200.000 postal parcels Fire 2012, Kolkata VA & IR - Giuseppe Santucci 23 19 December 2012
Mixed visualization Byte traffic into the ANS/NSFNET T3 backbone in 1993
Amplify cognition using the human vision • Highest bandwidth human sense • Fast, parallel • Pattern recognition • Extends memory and cognitive capacity • People think visually (I see… means also I understand in most languages) • Amplify cognition • Pre-attentive (we use only the eyes, not the brain) • Two quick examples (4 seconds each)
Three simple questions
The quick answers
One (very) simple question • How many 3s here ? • You have 4 seconds… 458757626808609928083982698028 747976296262867897187743671947 746588786758967329667287682085
So ? • Time was not enough? • You can do that in less than 0.2 seconds ! • Let’s try a different visualization…
• Color is pre-attentive (pops up) • No cognitive effort is required • A lot of issues are already clear • Most of people ignore them... • It is not enough to use wrist and bells
Canonical steps in Infovis – STEP 1 Internal DATA Representation Mathematics Sport Physics Encoding of values Chemistry Literature Univariate data History Art Bivariate data Geography Trivariate data Multidimensional data Encoding of relationships Temporal data Map & Diagrams Graphs/Trees Data streams Fire 2012, Kolkata VA & IR - Giuseppe Santucci 31 19 December 2012
Canonical steps in infovis – STEP 2 Internal Representation Space limitations Scrolling Presentation Overview + details Distortion Suppression Zoom & pan Semantic zoom Time limitation Perceptual issues Cognitive issues Fire 2012, Kolkata VA & IR - Giuseppe Santucci 32 19 December 2012
Problem solved! We have (∼) agreed and ( ∼ ) mature solutions for Presentation Representation of a large variety of data So I’m done! Questions ? Fire 2012, Kolkata VA & IR - Giuseppe Santucci 33 19 December 2012
Data size and complexity ! • 100 million FedEx transactions per day • 150 million VISA credit card transactions per day • 300 million long distance ATT calls per day • 50 billion e-mails per day • 600 billion IP packets per day • 1 trillion (10 12 ) of web pages (according to Google), corresponding to about 3 petabytes of data • Google processes 20 petabytes of data per day Fire 2012, Kolkata VA & IR - Giuseppe Santucci 19 December 2012
Recommend
More recommend