Approaches, Applications, and Research Challenges Tobias Schreck - PowerPoint PPT Presentation

Visual Search and Analysis in Textual and Non-Textual Document Repositories Approaches, Applications, and Research Challenges Tobias Schreck Visual Analytics Group Computer and Information Science University of Konstanz, Germany CLEF 2012 Conference and Labs of the Evaluation Forum 2012 19.09.2012

1. Need for Search and Analysis in Large Data Technological progress: Information Overload – Acquisition, production, storage Share of digital information – Data integration, data mining 2000: 25%  Large and increasing amounts of data 2002: 50% (Begin Digital Age ) 2007: 94% (300 Exabyte) Data-intensive application domains – Business Estimated growth rates (1986-2007) – Research Storage: 23% – Engineering Network: 28% Compute: 56% Need for new technologies Source: Science, according to – „… to unite the seemingly conflicting [F&L 3/2011] requirements of scalability and usability in making sense of the data“ [ VisMaster 2010 ] 2

1. Data Examples Textual Data Repositories – Digital Libraries Digital Libraries Customer Reviews – Web (Amazon.com) – Social Media www.facebook.com www.twitter.com Non-textual Data Repositories – Image repositories – 3D Object repositories Victoria State Library Image – Data repositories Collection (http://www.slv. vic.gov.au/) Sloan Digital Sky Survey PROBADO3D Archive (http://www.sdss.org/) 3 (http://www.probado.de/3d.html)

1. How to Make Use of Large Data Repositories? Searching – Find information entities of interest – Reusage, comparison – Based on specification of queries Analyzing – Find structures and abstractions (“Understand” data set as a whole) – Check hypotheses – Make interesting, actionable observations Interdependence – Cycles of searching and analyzing 4

1. Visual Search and Analysis Visual representation of the search and analysis process [ Shneiderman 1996 ] Goals of Visual Information Systems – Intuitive access, direct manipulation – Leverage human visual perception – Encourage exploration [Ahlberg and Shneiderman 1994] Classic visual search systems – Filmfinder [ Ahlberg and Shneiderman 1994 ] – Time Searcher [ Hochheiser and Shneiderman 2004 ] Classic visual analysis systems – Spire/In-Spire [ Wise et al 1995 ] – Visual decision tree construction and analysis [ Teoh and Ma 2003 ] 5 [ Wise et al 1995 ]

Propositions of this Talk 1. Emerging large, complex data sources pose new challenges to Information Retrieval and Understanding 2. Visual-interactive methods are useful to support retrieval and data understanding 3. Promising research opportunities at intersection of visualization, information retrieval, and evaluation 6

Outline 1. Introduction 2. Overview Visualization for Large Text 2.1 Feature-based Text Visualization 2.2 Attribute-based Text Visualization 2.3 Visual Document Summarization 2.4 Geo-referenced Micro Blogging Text 3. Visual Search in Non-Textual Data 4. Promising Research Opportunities 5. Conclusions 7

2.1 Sentiment Analysis • Opinion score derived from adjectives, nouns, and verbs • Identifies positive and negative sections  Overview over large document corpora  Find articles which suit the mood of the reader [Keim, Mansmann et al., 2008] 8

2.1. Sentiment Analysis: News Overview 9

2.1 Pixel-based Approach Feature: average sentence length [Oelke et al., 2008] 10

2.1 Readability Features [Oelke, Spretke et al., 2010] 11

2.1 Readability Features: Vocabulary Difficulty of 2009 German Election Programs Feature: Vocabulary Difficulty Die Linke Piraten [Oelke, Spretke et al., 2010] 12

2.2 Attribute-based: Story, Character Complexity King‘s IT Rowling‘s Harry Potter [Wanner, Fuchs et al., 2011] 13

2.2 Attribute-based: Visual Review Analysis • User opinions abundantly available – Forums, Blogs – E-commerce – … • Many application possibilities – Product reviews for customers – Market analysis – Customer relationship Amazon customer reviews management (amazon.com) 14

2.2 Attribute-based: Visual Review Analysis • Basic method – Identify product attributes – Identify positive/negative opinions – Calculate weighted attribute vector • Visual comparison of sets of reviews – Glyph matrix approach – Cluster analysis • Applied to printer product cartridge paper price printer scanner software reviews tray 0 -1 0 +1 0 +1 [Oelke, Hao et al., 2009] 15

2.2 Attribute-based: Visual Review Analysis 16 [Oelke, Hao et al., 2009]

2.2 Attribute-based: Customer Segmentation 17 [Oelke, Hao et al., 2009]

2.3 Visual Content Overviewing • Visual abstract for scientific articles – Extraction of important figures and keyword – Layout of elements in generalized word cloud • Overviewing • Navigation • Comparison [Strobelt, Oelke et al., 2009] 18

2.3 Visual Content Overviewing 19 [Strobelt, Oelke et al., 2009]

2.3 Visual Content Overviewing 20 [Strobelt, Oelke et al., 2009]

2.4 Georeferenced Microblogging Text • Microblogging Text (e.g., Twitter) – Short text messages Nice view, – Time stamp all fine … – GPS position • Potential analytic use Stuck in a jam after – Trend analysis traffic – Marketing, Reputation accident … monitoring – Situational awareness for civil [www.google.com] defense or crisis management 21

2.4 SensePlace2 Tool [MacEachren, Jaiswal et al., 2011] 22

2.4 VAST Micro Blogging Challenge • VAST Challenge 2011 – Fictitious city including street network and POIs – 1 mio microblogging messages for 20 days incl. spatial positon [http://hcil.cs.umd.edu/localphp/hcil/vast11/] – Fictitious hidden epidemic scenario • Task – Find possible epidemics and its characteristics 23

2.4 VAST Micro Blogging Challenge [Bertini, Buchmüller et al., 2011] 24

2.4 Concentration on Bridges 25

2.4 Concentration in Hospitals 26

2.4 Message Distribution (19.05.) – Filtered for Symptom Keywords 27

2.4 VAST Micro Blogging Challenge 28

Remainder of this Talk 1. Introduction 2. Overview Visualization for Large Text 2.1 Feature-based Text Visualization 2.2 Attribute-based Text Visualization 2.3 Visual Document Summarization 2.4 Geo-referenced Micro Blogging Text 3. Visual Search in Non-Textual Data 3.1 Sketch-based 3D Object Retrieval 3.2 Retrieval in Bivariate Measurement Data 4. Promising Research Opportunities 5. Conclusions 29

3. Visual Search in Non-Textual Data Multitude of complex document types – Images – Video – 3D Objects – Multivariate Research Data – Etc. PROBADO3D Archive [http://www.probado.de/3d.html] Research questions to address – Similarity functions? – Query types to support? – How to evaluate? Victoria State Library Image Collection Sloan Digital Sky Survey (http://www.slv.vic.gov.au/) (http://www.sdss.org/) 30

3.1 Query-by-Exampe and Sketch-Based Retrieval Problems: 1. How to compare structurally different views? 2. How to evaluate different sketching styles? 31

3.1 Gradient Features, Suggestive Contours [DeCarlo et at., 2003] [Yoon et al., 2010] 32

3.1 Sketch-Based 3D Object Retrieval 14 classes subset of Princeton Shape Benchmark [ Shilane et al 2004 ] Evaluation of retrieval performance (per class, given user sketch) Collection of 20 user [Yoon et al., 2010] sketches per class 33

3.1 SHREC’12 Track : Sketch-Based 3D Retrieval [SHREC 2012 Sketch-based 3D Retrieval Track] 34

3.1 Large-Scale Sketch Benchmark Crowd-sourced approach of [Eitz et al., 2012 a ] • 20.000 sketches from 1300 users • 250 representative object categories • Basis for improved benchmarking study [Eitz et al., 2012 b ] Recognition experiment • Avg. human accuracy: 73% • Avg. automatic accuracy: [Eitz et al., 2012 a ] 56% 35

3.2 Visual Search in Bivariate (Research) Data • Jim Gray‘s Fourth Paradigm and emerging research data repositories [Hey, Tansley, Tolle 2009] • Prominent type of quantitative data: bivariate and multivariate [Pangaea] data • Common visual representation – Scatter plot – Scatter plot matrix • Content-based support for visual search and analysis in this data? 36

3.2 Regressional Feature Vector for Comparing Scatter Plots Perform regressions (linear, square, log, …) Form feature vectors • Goodness of fit scores • Coefficient parameters [Scherer, Bernard et al., 2011] 37

3.2 Search and Analysis Application query by example [Scherer, Bernard et al., 2011] cluster altitude vs PPPP (pressure hPa) sort by similarity to f(x)=e^-x Spatial reference of data sets 38

3.2 A Benchmark for Earth Observation Data • But how to create a benchmark data set for automatic evaluation? • Input data – BSRN earth observation data (radiation, temperature, etc.) for 40 stations – 24.700 bivariate plots generated [Pangaea] • Tobler’s First Law of Geography for Similarity Class Formation Position x Month x Parameter – 18x6 Longitude/Lattitude grid – Month of year – pressure Parameters of measurement temp  1608 similarity classes alt CO2 • Evaluation of nine feature vectors O3 – Retrieval precision … – Timing 39 [Scherer, v. Landesberger et al., 2012]

Approaches, Applications, and Research Challenges Tobias Schreck - PowerPoint PPT Presentation

Visual Search and Analysis in Textual and Non-Textual Document Repositories Approaches, Applications, and Research Challenges Tobias Schreck Visual Analytics Group Computer and Information Science University of Konstanz, Germany CLEF 2012

E-Wissenschaft - Enhancing how? Tobias Blanke tobias.blanke@kcl.ac.uk Centre for e-Research,

Visualization with Virtual and Augmented Reality Tobias Isenberg and Xiyao Wang Who are we?

Ring-LWE Implementation Tobias Oder 1 , Tobias Schneider 2 , Thomas Pppelmann 3 , Tim Gneysu

27.10.2016 The Team Dipl.-Ing. Tobias Rohde Dipl.-Ing. Tobias Kaupat

Decibel You are not alone! http://decibel.kde.org/ Dipl.-Technoinform. Tobias Hunger

PERSISTENT I/O CHALLENGES & APPROACHES CHALLENGES & APPROACHES Angelos Bilas, FORTH

God of Peace? Question Question Various approaches Question Various approaches Suggestions

New Approaches to New Approaches to New Approaches to Repair of Repair of Repair of Spinal

Principal Component Analysis Proseminar Data Mining Tobias Holl 1 1 Technische Universitt

Archosaur Diversity Tobias Landberg, Ph.D. Director of Research The Amphibian Foundation

Assessing glacier area and volume/mass changes Tobias Bolch, Glaciology and Geomorphology Group

The Undiscovered Country Tobias Fiebig - Introduction Router? Device Presence Estimation from

Crowdsourcing Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 Tobias Hossfeld, Babak

ZIGBEE SMART HOMES A HACKERS OPEN HOUSE ZIGBEE SMART HOMES TOBIAS ZILLNER ABOUT ME

Outline Research Problem Research Problem Challenges Approaches & Gaps PHD

Exchange on Low-Cost FPGAs Tobias Oder and Tim Gneysu Ruhr-University Bochum Latincrypt 2017

Advances on cognitive automation at LGI2P / Ecole des Mines d'Als Doctoral research snapshot

Network approach for bringing together brain structure and function Sebastiano Stramaglia

From moments to sparse representations, a geometric, algebraic and algorithmic viewpoint Bernard

Diagnosis and the clinical spectrum of leprosy Salvatore Noto, Pieter A. Schreuder and Bernard

Beryllium Target studies for Long Baseline Neutrino Experiment (LBNE) at 0.7 MW and 2 MW

Helium Ground State tremendously accelerated by using basis functions which satisfy the

OpenDaylight OpenFlow Plugin - Abhijit Kumbhare, Principal Architect, Ericsson; Project Lead -

I2RS built for High box CLI performance Sue Hares, Eric Voit, andothesr I2RS I2RS Concepts

Sambuz

Useful Links

Newsletter

Mail Us

Approaches, Applications, and Research Challenges Tobias Schreck - PowerPoint PPT Presentation

Visual Search and Analysis in Textual and Non-Textual Document Repositories Approaches, Applications, and Research Challenges Tobias Schreck Visual Analytics Group Computer and Information Science University of Konstanz, Germany CLEF 2012

E-Wissenschaft - Enhancing how? Tobias Blanke tobias.blanke@kcl.ac.uk Centre for e-Research,

Visualization with Virtual and Augmented Reality Tobias Isenberg and Xiyao Wang Who are we?

Ring-LWE Implementation Tobias Oder 1 , Tobias Schneider 2 , Thomas Pppelmann 3 , Tim Gneysu

27.10.2016 The Team Dipl.-Ing. Tobias Rohde Dipl.-Ing. Tobias Kaupat

Decibel You are not alone! http://decibel.kde.org/ Dipl.-Technoinform. Tobias Hunger

PERSISTENT I/O CHALLENGES &amp; APPROACHES CHALLENGES &amp; APPROACHES Angelos Bilas, FORTH

God of Peace? Question Question Various approaches Question Various approaches Suggestions

New Approaches to New Approaches to New Approaches to Repair of Repair of Repair of Spinal

Principal Component Analysis Proseminar Data Mining Tobias Holl 1 1 Technische Universitt

Archosaur Diversity Tobias Landberg, Ph.D. Director of Research The Amphibian Foundation

Assessing glacier area and volume/mass changes Tobias Bolch, Glaciology and Geomorphology Group

The Undiscovered Country Tobias Fiebig - Introduction Router? Device Presence Estimation from

Crowdsourcing Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 Tobias Hossfeld, Babak

ZIGBEE SMART HOMES A HACKERS OPEN HOUSE ZIGBEE SMART HOMES TOBIAS ZILLNER ABOUT ME

Outline Research Problem Research Problem Challenges Approaches &amp; Gaps PHD

Exchange on Low-Cost FPGAs Tobias Oder and Tim Gneysu Ruhr-University Bochum Latincrypt 2017

Advances on cognitive automation at LGI2P / Ecole des Mines d'Als Doctoral research snapshot

Network approach for bringing together brain structure and function Sebastiano Stramaglia

From moments to sparse representations, a geometric, algebraic and algorithmic viewpoint Bernard

Diagnosis and the clinical spectrum of leprosy Salvatore Noto, Pieter A. Schreuder and Bernard

Beryllium Target studies for Long Baseline Neutrino Experiment (LBNE) at 0.7 MW and 2 MW

Helium Ground State tremendously accelerated by using basis functions which satisfy the

OpenDaylight OpenFlow Plugin - Abhijit Kumbhare, Principal Architect, Ericsson; Project Lead -

I2RS built for High box CLI performance Sue Hares, Eric Voit, andothesr I2RS I2RS Concepts

Sambuz

Useful Links

Newsletter

Mail Us

PERSISTENT I/O CHALLENGES & APPROACHES CHALLENGES & APPROACHES Angelos Bilas, FORTH

Outline Research Problem Research Problem Challenges Approaches & Gaps PHD