How Can I Help? : Zero-Shot Multi-Modal Automation with QA Michael - PowerPoint PPT Presentation

Oct 14, 2022 •212 likes •310 views

How Can I Help? : Zero-Shot Multi-Modal Automation with QA Michael Du, Sam Masling Nancy Xu The Average American Spends 6hrs/day on the Internet - Imagine an agent automated some of those tasks. And we spent less time! - Virtual Personal

How Can I Help? : Zero-Shot Multi-Modal Automation with QA Michael Du, Sam Masling Nancy Xu
The Average American Spends 6hrs/day on the Internet - Imagine an agent automated some of those tasks. And we spent less time! - Virtual Personal Assistants (VPA) ex. Alexa, Google Assistant, Siri, Cortana, and Bixby unable to cover long tail of user requests . - Programming by Demonstration systems allow us to demonstrate new skills to agents. - 1. Prompting the user to provide a natural language utterance to refer to the skill - 2. Asking users to demonstrate the skill in the browser - 3. Capturing and name relevant variables and the sequences of clicks. - 4. Saving the demonstration to be called by name in the future.
Programming Dialogue Agents on the Web is Hard 1. Require end-user to demonstrate full space of possible browser actions => time-consuming + incomplete. 2. CSS selectors are brittle . 3. Skills are not generalizable to new domains or sites. 4. Training dialogue systems is non-trivial . VASTA SkillBot
What if you could generate an agent from any website ? Like a human reading a website -- no extensive demonstration needed.
Web Elements Perform 3 Main Purposes: Inform / Request / Act CONTENT SLOT SLOT SLOT SLOT ACTION
HTML induced questions (with language models?) + UI Grammar Templates CONTENT Where from? Where to? When to leave? # travelers? ACTION Where from? Where are you flying from? Where are you departing from? What is the departure city?
Zero-Shot Slot Filling + Navigation as Question-Answering Please help me book a flight from SF to JFK SF departing on Oct 30, 2020 . CONTENT SLOT NLU Where from? Where to? When to leave? # travelers? Please help me book a flight Where from? ACTION from SF to JFK departing on Oct 30, 2020 .
Demo: SiteBot , a multi-model conversational interface. Book a flight by navigating through Google -> OneBox via Chrome extension chatbot. Powered by QA NLU + Induced Questions
Project Timeline : - Week 4: Build a simple puppeteer agent that comprehends user utterance -> executes multi-modal automation for Google. - Week 5-6 : Study web structure + classify element types. Create question templates w/ ARIA etc. Also experiment with learning questions automatically from HTML with GPT 3 / language models. BoolQA models for actions (or CoQA) + ExQA on content. - Week 7 : Finetune Q&A models on synthetic training data generated by UI grammars + paraphrasing. Collect test data (user utterance + slots) on 10 websites using Mechanical Turk. - Week 8: Build chrome extension interface within puppeteer browser for chatting with the agent. - Week 9 : Validate results on test data. Compare zero-shot QA technique against known benchmarks for slot-filling etc. - Week 10: Leeway. Presentation. Paper. Etc. - Week 10 + Reach : - Identify necessary slots for actions the seed multi-turn dialogue.

Recommend

Multi-modal Face Recognition Hu Han hanhu@ict.ac.cn http: / / vipl.ict.ac.cn/ members/ hhan

Multi-modal Face Recognition Hu Han hanhu@ict.ac.cn http: / / vipl.ict.ac.cn/ members/ hhan 2016/ 04/ 06 2 Trend on multi-modal (face) recognition Multi-modal & cross-modal FR Conclusion and discussion hanhu@ict.ac.cn

773 views • 49 slides

Directions in Dialogue Research (Engineering Applications in mind) Speech only Multi

2012/7/5 Multi modal Sensing and Analysis of Multi modal Sensing and Analysis of Poster Conversations Toward Smart Posterboard Tatsuya Kawahara (Kyoto University, Japan) http://www.ar.media.kyoto u.ac.jp/crest/ Directions in Dialogue

333 views • 29 slides

Task adjustment options: You may select one of the poems provided by your teacher, however, please

ASSESSMENT TASK NOTIFICATION _______________________ SUBJECT: English Student Name YEAR GROUP: 7 Submitted To: TASK TITLE: Multi Modal Presentation __________________________ Name of Unit: The Wonder of Words Type of Task: Multi Modal

343 views • 5 slides

Guiding Interaction Behaviors for Multi-modal Grounded Language Learning Jesse Thomason, Jivko

Guiding Interaction Behaviors for Multi-modal Grounded Language Learning Jesse Thomason, Jivko Sinapov & Raymond J. Mooney Presented by Siliang Lu Multi-modal grounded language learning Multiple modalities Physical properties of objects

506 views • 14 slides

Multi-Modal Spectral Image Super-Resolution IVRL Prime Fayez Lahoud, Ruofan Zhou, Sabine

Multi-Modal Spectral Image Super-Resolution IVRL Prime Fayez Lahoud, Ruofan Zhou, Sabine Ssstrunk Image and Visual Respresentation Lab School of Computer and Communication Sciences cole Polytechnique Fdrale de Lausanne 1 Multi-Modal

722 views • 19 slides

1 Automation Overview Definition Automation (automation, Automation ) : 1) set of all measures

Industrial Automation Spring 2019, EPFL 1 Automation Overview Definition Automation (automation, Automation ) : 1) set of all measures aiming at replacing human work through machines (e.g. automation is applied science) 2) the

1.4k views • 63 slides

CVPR 2020 Video Pentathlon Challenge: Multi-modal Transformer for Video Retrieval Valentin

CVPR 2020 Video Pentathlon Challenge: Multi-modal Transformer for Video Retrieval Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid Video signal richness Video encoder Our cross-modal architecture Thank You

347 views • 5 slides

W HAT IS EHD? Introduction EHD without cross-flow Modal Dielectric fluid Non-modal EHD with

Introduction S TABILITY OF PLANAR SHEAR FLOW EHD without cross-flow Modal IN THE PRESENCE OF ELECTROCONVECTION Non-modal EHD with cross-flow Modal F. Martinelli 1 , M.Quadrio 1 , 2 & P .Schmid 1 Non-modal Conclusions 1 LadHyx, cole

790 views • 35 slides

Structured Losses Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

Structured Losses Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli Oh et al. 2017 https://arxiv.org/abs/1706.05064 Presented by Beln Saldas belen@mit.edu

820 views • 51 slides

Multi-Modal Image Processing with Applications to Art Investigation and Beyond Miguel Rodrigues

Multi-Modal Image Processing with Applications to Art Investigation and Beyond Miguel Rodrigues Dept. Electronic and Electrical Engineering University College London Collaborators Ingrid Daubechies Bruno Cornellis Duke U. VUB Pingfan Song

491 views • 36 slides

Annotation Graphs, Annotation Servers and Multi-Modal Resources Infrastructure for

Annotation Graphs, Annotation Servers and Multi-Modal Resources Infrastructure for Interdisciplinary Education, Research and Development Christopher Cieri and Steven Bird University of Pennsylvania Linguistic Data Consortium

439 views • 13 slides

Joint Representation Learning for Multi-Modal Transportation Recommendation Hao Liu 1 , Ting Li 2 ,

Joint Representation Learning for Multi-Modal Transportation Recommendation Hao Liu 1 , Ting Li 2 , Renjun Hu 3 , Yanjie Fu 4 , Jingjing Gu 5 , Hui Xiong 1 1 The Business Intelligence Lab, Baidu Research 2 National University of Defense

390 views • 17 slides

Ashland Downtown Parking & Multi-Modal Study Kick Off

Ashland Downtown Parking & Multi-Modal Study Kick Off Meeting December 5, 2013 Ashland, Oregon Agenda Introductions & Background (Staff 20

416 views • 39 slides

Smart Multi Modal Logistic Terminals How Hinterland terminals become the best practice for future

Smart Multi Modal Logistic Terminals How Hinterland terminals become the best practice for future incl. a case study for the Saudi Vision 2030 Presented by: Melanie Lenhardt Camelot Management Consultants 9 th Global Supply Chain and

425 views • 17 slides

Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels Florian

TraininG towards a society of data-saVvy inforMation prOfessionals to enable open leadership INnovation Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels Florian Mai Iacopo Vagliano Ansgar Scherp Lukas

1.14k views • 50 slides

Conclusions TRECVID 2008 Conclusions TRECVID 2008 Good settings for Bag Good settings

MediaMill TRECVID 2009 17 11 2009 Multi Multi- -Frame, Multi Frame, Multi- -Modal, and Multi Modal, and Multi- -Kernel Kernel Concept Detection in Video Concept Detection in Video Cees Cees G.M. Snoek , G.M. Snoek , Koen

487 views • 15 slides

Distributed Multi-modal Similarity Retrieval David Novak Seminar of DISA Lab, October 14, 2014

Distributed Multi-modal Similarity Retrieval David Novak Seminar of DISA Lab, October 14, 2014 David Novak Multi-modal Similarity Retrieval DISA Seminar 1 / 17 Outline of the Talk Motivation 1 Similarity Search E ff ectiveness and E ffi

633 views • 47 slides

Smart Communities How 5G, Mobility, Vision Zero and Multi-Modal Approaches are Converging

Smart Communities How 5G, Mobility, Vision Zero and Multi-Modal Approaches are Converging Sean Harrington Vice President of City Solutions, Verizon Addressing Most Pressing City Priorities Mobility Public safety Sustainability Digital

406 views • 23 slides

How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval Noa Garcia &

How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval Noa Garcia & George Vogiatzis 4th Workshop on Computer Vision for Art Analysis Motivation Semantic Art Understanding In this painting the church in Auvers has

421 views • 39 slides

WORDINAIRE Visualize While You Memorize Multi Modal Learning Our mission at Wordinaire is to

WORDINAIRE Visualize While You Memorize Multi Modal Learning Our mission at Wordinaire is to make the English language more accessible through multimodal learning Our vocabulary apps are like digital fl a s h c a r d s w i t h H Q p h o t

143 views • 10 slides

EndoTOFPET-US: A multi-modal endoscope for Ultrasound and Time-of-Flight PET Marco Pizzichemi

EndoTOFPET-US: A multi-modal endoscope for Ultrasound and Time-of-Flight PET Marco Pizzichemi On behalf of the EndoTOFPET-US collaboration IV Mediterranean Thematic Workshop in Advanced Imaging (MEDAMI) Ajaccio, Corsica, May 1-5 2016 The

917 views • 34 slides

Discovery and Fusion of Salient Multi-modal Features towards News Story Segmentation - @ TRECVID

Discovery and Fusion of Salient Multi-modal Features towards News Story Segmentation - @ TRECVID 2003 Workshop Winston Hsu 1 , Shih-Fu Chang 1 , Lyndon Kennedy 1 , Chih-wei Huang 1 , Ching-Yung Lin 2 , and Giridharan Iyengar 3 1 Dept. of

588 views • 17 slides

Semantic Multi-modal Analysis, Structuring, and Visualization for Candid Personal Interaction

Semantic Multi-modal Analysis, Structuring, and Visualization for Candid Personal Interaction Videos Alexander Haubold Department of Computer Science Columbia University Thesis Proposal Abstract Videos are rich in multimedia content and

469 views • 30 slides

A predictive multi-modal imaging marker for designing efficient and robust AD clinical trials

A predictive multi-modal imaging marker for designing efficient and robust AD clinical trials Vikas Singh , Ozioma Okonkwo Sterling C. Johnson , Vamsi K. Ithapu Computer Sciences Biostatistics and Medical Informatics

793 views • 54 slides