Computer vision technologies for visual knowledge enrichment Miriam Redi, Research Scientist
Images are powerful tools for communication and knowledge sharing
Wikimedia spaces are still missing many images! Around 1/10 articles in 95% of items in
One source of Visual Knowledge
Computer Vision
How do we bridge the visual knowledge gap? Computer vision can help contributors ● Organize images ○ Describe images ○ Search and Find images ○
How do we fill the gap responsibly? _ Knowledge equity : As a social movement, we will focus_ _our efforts on the knowledge and communities that have _ _ been left out by structures of power and privilege . We will_ _welcome people from every background to build strong and _ _diverse communities. We will break down the social, political, _ _ and technical barriers preventing people from accessing and_ _contributing to free knowledge._
RESPONSIBLE * Computer Vision * unbiased (against stereotypes) representative (multicultural) empowering (vs exclusive) 9
Bias: learning from Unbiased Machine Vision “human” data 10
VERY Large
Biased Machine Vision Object detection 12
Unbiased Machine Vision Interpretable Vision Algorithms Automatically Checking for Stereotypes in Datasets Miriam Redi, Nikhil Rasiwasia, Alejandro Jaimes, Gaurav Aggarwal The Beauty of Capturing Faces, Rating the Quality of Digital Portraits Face and Gesture Recognition, FG 2015, Ljubljana, Slovenia 13
Unbiased Machine Vision Computational Portrait Aesthetics Computational Aesthetics Detecting beautiful images Computational Portrait Aesthetics Detecting beautiful images of faces ( NOT beautiful faces) 14
Unbiased Machine Vision Computational Portrait Aesthetics Data Hundreds of thousands of photographs annotated in terms of quality by photographers Interpretable Visual Feature extraction Inspired by Portrait Photography ● Expanded with demographics ● EYE SHARPNESS features A 15
Unbiased Machine Vision Computational Portrait Aesthetics Correlation between quality scores and individual features -- The dataset is NOT 16 Biased! :)
Unbiased Machine Vision How do we define bias? How can we identify and operationalize such bias dimensions? 17
Representation: learning from unevenly distributed data
Underrepresentative Machine Vision Culture biases 19
Representative Machine Vision Some solutions
Representative Machine Vision Multicultural Machine Vision Tools Reflecting visual definitions and preferences of people around the world 21
Representative Machine Vision We have sentiment detectors for images… Reflecting sentiment perception of small groups of (western) people Can we make image sentiment classifiers multicultural? Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, Shih-Fu Chang Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology ACM Multimedia 2015, Brisbane, Australia 22 DATA, CODE, DEMO: mvso.columbia.edu
Representative Machine Vision Multilingual Visual Sentiment Ontology 7.6M IMAGES 12 LANGUAGES SENTIMENT LANGUAGE-SPECIFIC 16K ANPs ADJECTIVE-NOUN FLICKR ANNOTATIONS CRAWLING PAIR (ANP) Through DISCOVERY CROWDSOURCING EMOTION KEYWORDS healthy breakfast, health coffee, ... [Plutchik 1980] 12 LANGUAGES Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, Shih-Fu Chang Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology ACM Multimedia 2015, Brisbane, Australia 23
Representative Machine Vision Multilingual Visual Sentiment Ontology Cultural insights based ● on semantically related concepts Each cluster reveals ● Wording variation ○ Sentiment variation ○ Visual content ○ variation 24
Representative Machine Vision Multilingual Visual Sentiment Ontology Cross-Lingual Sentiment Prediction ES FR ZH DE EN IT LANGUAGE-SPECIFIC SENTIMENT PREDICTORS CROSS-LINGUAL SENTIMENT PREDICTION 25
Representative Machine Vision Multilingual Visual Sentiment Ontology Cross-Lingual Sentiment Prediction Comparing different models to understand similarities and differences between language communities. 26
Representative Machine Vision What is really “representative”? What is the tradeoff between complexity and representativeness? 27
Empowerment: the algorithm in the human loop 28 Photo by Franck V. on Unsplash
Empowering Machine Vision Finding Representative Images for Human Knowledge In a collaborative environment. 29
Empowering Machine Vision Wikidata Visual Thinking Wikidata is an international and thus multilingual project. While English is the default interface language, the project is intended to be used by, and useful for, users of every language with MediaWiki internationalization support.
Empowering Machine Vision Missing images in Wikidata PEOPLE ALL SPECIES
Empowering Machine Vision Typical Scenario: u sers willing to add images to Wikidata might need to manually search for the right image using different tools from millions of free-licensed images Manual Selection and Evaluation Item without P18 - ‘Has image’ Manual search through millions of free licensed images
Empowering Machine Vision Solving the problem of visual enrichment in a collaborative fashion, leveraging the wisdom of the community Item without P18 - ‘Has image’ Manual Selection and Discovering related Ranking images according Evaluation images from different to Relevance and Quality, open sources automatically inferred from community curation
Empowering Machine Vision Automatically discovering RELEVANT Free-Licensed images from linked pages, Commons search, Flickr search
So much user-generated visual information.. How to prioritize it? 35
Empowering Machine Vision QUALITY: is the image of high photographic quality? Not all relevant images are actually ‘good’ images Photo: Jee & Rani Nature Photography on Commons Photo: Vinayaraj on Commons High Quality Lower Quality
Modeling the wisdom of the community To surface high-quality pictures, we train a model using data curated by the Wikimedia community High Quality: Lower Quality: 160K 160K Quality Commons Random Commons
Empowering Machine Vision FRAMEWORK: Convolutional Neural Network Google Inception-v3[1] High Low Drawing: Aphex34 on Commons [1] Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2016.
Empowering Machine Vision Example results
Empowering Machine Vision Community tools for disseminating image recommendations https://tools.wmflabs.org/wikidata-game/distributed/#game=49&opt=%7B%22type%22%3A%22flower%22%7D
Representative Machine Vision Is this really empowering? How do we measure disruptions introduced by algorithms? 41
Summary Unbiased: Interpretable models to detect data stereotypes ● Representative: Culture-specific models that reflect different ● visual worlds Empowering: Helping communities with visual knowledge ● enrichment
Thank you!
Challenges Bias: How to identify bias dimensions? ● How do we evaluate bias with the same communities? ○ Representation: How to make really representative, universal models? What does ● that mean? Representation vs quality; representative image search results ○ Empowerment: How to use machine learning technologies in harmony with ● democratic, collaborative processes? More! Are we missing something? ● Privacy of people in pictures ○ Data re-use, what are people doing with images? ○
Recommend
More recommend