Combining Text and Image Processing in an Automa6c Image - PowerPoint PPT Presentation

Combining ¡Text ¡and ¡Image ¡Processing ¡in ¡ an ¡Automa6c ¡Image ¡Annota6on ¡System ¡ Iulian ¡Ilieș ¡(SHSS, ¡Jacobs ¡University) ¡ Joint ¡work ¡with ¡Arne ¡Jacobs, ¡OFhein ¡Herzog ¡(TZI, ¡Universität ¡ Bremen), ¡and ¡Adalbert ¡Wilhelm ¡(SHSS, ¡Jacobs ¡University) ¡ Supported ¡by ¡the ¡Deutsche ¡ForschungsgemeinschaO ¡(DFG) ¡

Overview ¡  Mo#va#on ¡and ¡approach ¡  Current ¡work: ¡  Framework ¡of ¡concept ¡propaga#on ¡  Data ¡and ¡algorithms ¡employed ¡  Comparison ¡of ¡different ¡classifiers ¡  Effect ¡of ¡visual ¡vocabulary ¡size ¡  Summary ¡and ¡outlook ¡

Mo6va6on ¡  Con#nuously ¡increasing ¡quan#ty ¡of ¡image ¡data ¡available ¡ on ¡the ¡Internet, ¡which ¡necessitates ¡efficient ¡classifica#on ¡ and ¡indexing ¡methods ¡for ¡easy ¡access ¡and ¡usage ¡  Exis#ng ¡methods, ¡especially ¡mainstream, ¡do ¡not ¡exploit ¡ all ¡available ¡informa#on: ¡  Text-‑based ¡search, ¡using ¡file ¡names ¡and/or ¡cap#ons ¡  Pure ¡visual ¡search, ¡relying ¡only ¡on ¡image ¡features ¡  Seman#c ¡search, ¡via ¡image ¡understanding ¡techniques ¡

Approach ¡  Combine ¡the ¡advantages ¡of ¡these ¡different ¡viewpoints ¡ into ¡an ¡integrated ¡framework, ¡which ¡would ¡allow ¡the ¡ classifica#on ¡of ¡images ¡using ¡keywords, ¡features, ¡or ¡both ¡  Focus ¡on ¡the ¡construc#on ¡of ¡a ¡dual-‑layered ¡linkage ¡ scheme ¡between ¡images, ¡based ¡on ¡the ¡co-‑occurrence ¡of ¡ keywords, ¡and ¡on ¡similari#es ¡between ¡visual ¡features ¡  Define ¡visual ¡words, ¡and ¡associate ¡them ¡to ¡keywords ¡

Framework ¡ Clustering ¡algorithm ¡ Visual ¡concept ¡detector ¡ Visual ¡words ¡ Images ¡ (prototype ¡features) ¡ Cap6ons ¡ Keywords ¡ Textual ¡concept ¡detector ¡ Classifier ¡

Concept ¡propaga6on ¡  Directly ¡transfer ¡the ¡associa#ons ¡with ¡keywords ¡from ¡ cap#ons ¡to ¡related ¡images, ¡and ¡further ¡to ¡the ¡visual ¡ features ¡found ¡in ¡these ¡images ¡  For ¡each ¡visual ¡word, ¡average ¡across ¡the ¡visual ¡features ¡ that ¡have ¡it ¡as ¡prototype, ¡and ¡contrast ¡the ¡obtained ¡ value ¡with ¡the ¡corresponding ¡global ¡average ¡  These ¡opera#ons ¡can ¡be ¡performed ¡in ¡reversed ¡order! ¡

Classifier ¡ Images ¡ Visual ¡words ¡ Visual ¡features ¡ (clusters) ¡ Cap6ons ¡ Image-‑concept ¡ Feature-‑concept ¡ Cluster-‑concept ¡ associa6ons ¡ associa6ons ¡ associa6ons ¡ Image-‑concept ¡ Feature-‑concept ¡ Training ¡ associa6ons ¡ associa6ons ¡ Tes6ng ¡ Test ¡images ¡ Visual ¡features ¡

Data ¡employed ¡  Images ¡and ¡related ¡text ¡(e.g. ¡cap#ons, ¡#tles) ¡harvested ¡ from ¡news ¡websites ¡  Strongly ¡structured ¡ ar#cles, ¡that ¡can ¡be ¡parsed ¡automa#cally ¡

Concept ¡detectors ¡  Specialized ¡keyword ¡detector: ¡  Person ¡names ¡extracted ¡from ¡cap#ons ¡by ¡a ¡named ¡ en#ty ¡recognizer ¡(NER; ¡Drozdzynski ¡et ¡al. ¡2004), ¡ complemented ¡by ¡manual ¡annota#ons ¡  Generic ¡visual ¡feature ¡detector: ¡  Interest-‑point ¡descriptors ¡extracted ¡from ¡images ¡by ¡ the ¡SIFT ¡algorithm ¡(Lowe ¡1999), ¡clustered ¡into ¡a ¡ vocabulary ¡of ¡visual ¡words ¡(Sivic ¡& ¡Zisserman ¡2003) ¡ ¡

Data ¡set ¡  Approx. ¡1000 ¡images ¡(some ¡duplicated) ¡and ¡associated ¡ cap#ons, ¡harvested ¡from ¡German ¡news ¡websites ¡  Over ¡50 ¡different ¡person ¡names ¡detected ¡in ¡the ¡cap#ons ¡ by ¡the ¡NER ¡algorithm: ¡  81% ¡precision ¡and ¡87% ¡recall ¡vs. ¡ground-‑truth ¡  Approx. ¡175000 ¡interest ¡point ¡descriptors ¡extracted ¡from ¡ the ¡images ¡with ¡the ¡SIFT ¡algorithm ¡

Current ¡experiments ¡  Used ¡a ¡standard ¡classifica#on ¡procedure: ¡  Par##oned ¡the ¡data ¡set ¡into ¡6 ¡stra#fied ¡subsets ¡– ¡5 ¡ cross-‑valida#on ¡sets, ¡and ¡a ¡test-‑only ¡set ¡  Trained ¡with ¡respect ¡to ¡the ¡F1-‑measure ¡(the ¡harmonic ¡ average ¡of ¡precision ¡and ¡recall) ¡  Using ¡the ¡simplex ¡search ¡algorithm ¡of ¡Lagarias ¡et ¡al. ¡ (1998) ¡for ¡objec#ve ¡func#on ¡maximiza#on ¡

Transfer ¡func6ons ¡  Defined ¡several ¡methods ¡for ¡calcula#ng ¡associa#on ¡ probabili#es ¡between ¡keywords ¡and ¡visual ¡prototypes: ¡  Use ¡the ¡significance ¡of ¡the ¡chi-‑square ¡test ¡contras#ng ¡ the ¡within-‑cluster ¡(-‑prototype) ¡and ¡global ¡averages ¡  Apply ¡a ¡sigmoid ¡func#on ¡to ¡the ¡ra#o ¡of ¡these ¡averages ¡  Apply ¡a ¡sigmoid ¡to ¡the ¡logarithm ¡of ¡the ¡ra#o ¡  Simply ¡truncate ¡the ¡ra#o ¡to ¡an ¡interval ¡centered ¡at ¡or ¡ near ¡1, ¡and ¡then ¡map ¡to ¡the ¡unit ¡interval ¡

Experiment ¡1 ¡-‑ ¡classifying ¡procedures ¡  Used ¡visual ¡vocabularies ¡of ¡100 ¡words ¡(clusters), ¡ obtained ¡with ¡the ¡k-‑means ¡algorithm ¡  Tested ¡the ¡four ¡methods ¡for ¡calcula#ng ¡the ¡degrees ¡of ¡ associa#on ¡between ¡visual ¡prototypes ¡and ¡keywords ¡  Tested ¡three ¡training ¡strategies ¡– ¡for ¡each ¡keyword ¡ separately, ¡globally, ¡and ¡with ¡predefined ¡parameters ¡  Trained ¡using ¡ground-‑truth ¡or ¡cap#on-‑based ¡associa#ons ¡

Experiment ¡1 ¡– ¡results ¡  Minor ¡differences ¡between ¡the ¡four ¡averaging ¡methods ¡  Best ¡results ¡obtained ¡when ¡using ¡ground-‑truth ¡data, ¡and ¡ training ¡each ¡concept ¡separately: ¡  F1-‑score ¡of ¡56% ¡at ¡training ¡and ¡34% ¡at ¡tes#ng ¡

Experiment ¡2 ¡– ¡vocabulary ¡size ¡  Different ¡clustering ¡algorithms ¡and ¡numbers ¡of ¡clusters: ¡  K-‑means ¡with ¡100 ¡clusters ¡(6 ¡hrs) ¡ ¡  K-‑medians ¡with ¡100 ¡clusters ¡(10 ¡hrs) ¡  TwoStep ¡(SPSS ¡algorithm ¡for ¡large ¡data ¡sets) ¡with ¡100, ¡ 500, ¡1000, ¡and ¡2000 ¡clusters ¡(10 ¡min ¡– ¡2 ¡hrs) ¡  Using ¡cap#on-‑based ¡data ¡only ¡(realis#c ¡seing), ¡and ¡ training ¡each ¡concept ¡separately ¡(best ¡performance) ¡

Experiment ¡2 ¡– ¡results ¡  Performance ¡increased ¡with ¡the ¡number ¡of ¡clusters, ¡with ¡ close ¡to ¡perfect ¡training ¡at ¡approximately ¡2000 ¡clusters ¡  (Data ¡did ¡not ¡have ¡enough ¡variance ¡to ¡produce ¡more ¡ clusters ¡with ¡the ¡default ¡seings ¡for ¡TwoStep) ¡

Experiment ¡3 ¡  Repeated ¡the ¡first ¡experiment ¡ ¡(tes#ng ¡different ¡ classifiers) ¡at ¡the ¡op#mal ¡vocabulary ¡size: ¡  Significantly ¡improved ¡results, ¡with ¡F1-‑scores ¡on ¡the ¡ test ¡images ¡of ¡65% ¡– ¡71% ¡and ¡close ¡to ¡perfect ¡training ¡

Experiment ¡3 ¡– ¡further ¡results ¡  Best ¡performance ¡using ¡ground-‑truth ¡data, ¡training ¡each ¡ concept ¡separately ¡– ¡F1-‑score ¡of ¡ ¡71% ¡on ¡test ¡images ¡  No ¡difference ¡between ¡training ¡each ¡concept ¡separately ¡ and ¡training ¡globally ¡when ¡using ¡the ¡cap#ons ¡as ¡source ¡ data ¡or ¡measuring ¡the ¡performance ¡on ¡test ¡images ¡  The ¡impact ¡of ¡training ¡data ¡(ground-‑truth ¡vs. ¡cap#ons-‑ based) ¡is ¡significantly ¡reduced ¡on ¡tes#ng ¡images ¡

Some ¡examples ¡ Training ¡ ¡Tes6ng ¡

Combining Text and Image Processing in an Automa6c Image - PowerPoint PPT Presentation

Combining Text and Image Processing in an Automa6c Image Annota6on System Iulian Ilie (SHSS, Jacobs University) Joint work with Arne Jacobs, OFhein

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Text Processing CS440 Text processing NLP tasks typically require multiple steps of text

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Chapter 9: Text Processing 10/16/2015 3:40 PM Text Processing 1 Outline and Reading Strings

Introduction: What is Image Processing? CS 4640: Image Processing Basics January 10, 2012 What

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

CS1100: Computer Science and Its Applications Text Processing Processing Text Excel can be

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Text processing Format Text File IASP 321 IASP 221 Dr. John Yoon Text Processing Commands

Van Dyke Rd Station New 115/13.2kV Station This text box and image This text box and image

Security for Cloud & Big Data CS 161: Computer Security Prof. David Wagner April 18, 2013

A Model for Recommending Research Articles: A Case Study in Computer Science, Neuroscience and

Argument Retrieval in Project Debater Yufang Hou IBM Research Europe, Dublin IBM Research:

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted

Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology

A Keyword-based ICN-IoT Platform Function tag Hashtags des Hierarchical Part z }| { z}|{ z }|

For Tuesday: Finish HW5 "Become a Requester" (Warning: you need to register as a

Combining Text and Image Processing in an Automa6c Image - PowerPoint PPT Presentation

Combining Text and Image Processing in an Automa6c Image Annota6on System Iulian Ilie (SHSS, Jacobs University) Joint work with Arne Jacobs, OFhein

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Text Processing CS440 Text processing NLP tasks typically require multiple steps of text

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Chapter 9: Text Processing 10/16/2015 3:40 PM Text Processing 1 Outline and Reading Strings

Introduction: What is Image Processing? CS 4640: Image Processing Basics January 10, 2012 What

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

CS1100: Computer Science and Its Applications Text Processing Processing Text Excel can be

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Text processing Format Text File IASP 321 IASP 221 Dr. John Yoon Text Processing Commands

Van Dyke Rd Station New 115/13.2kV Station This text box and image This text box and image

Security for Cloud &amp; Big Data CS 161: Computer Security Prof. David Wagner April 18, 2013

A Model for Recommending Research Articles: A Case Study in Computer Science, Neuroscience and

Argument Retrieval in Project Debater Yufang Hou IBM Research Europe, Dublin IBM Research:

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted

Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology

A Keyword-based ICN-IoT Platform Function tag Hashtags des Hierarchical Part z }| { z}|{ z }|

For Tuesday: Finish HW5 &quot;Become a Requester&quot; (Warning: you need to register as a

Security for Cloud & Big Data CS 161: Computer Security Prof. David Wagner April 18, 2013

For Tuesday: Finish HW5 "Become a Requester" (Warning: you need to register as a