Outline Outline Motivation Motivation 1 1. Email Speech Acts - PowerPoint PPT Presentation

Modeling Intention in Email : Speech Acts, Information Leaks and User Ranking Methods p g Vitor R. Carvalho Carnegie Mellon University William Cohen Ramnath a at Tom Tom Jon Jon Balasubramanyan Mitchell Elsas

Outline Outline Motivation Motivation 1 1. Email Speech Acts 2. Modeling textual intention in email messages Modeling textual intention in email messages � � Intelligent Email Addressing 3. Preventing information leaks Preventing information leaks � � Ranking potential recipients � Cut Once – a Mozilla Thunderbird extension � Fine-tuning Ranking Models 4. Ranking in two optimization steps � 2

Why Email Why Email � The most successful e-communication application. � Great tool to collaborate, especially in different time zones. � Very cheap, fast, convenient and robust. It just works. � Increasingly popular [ Shipley & Schwalbe, 2007] � Clinton adm. left 32 million emails to the National Archives � Bush adm….more than 100 million in 2009 (expected) � Visible impact Visible impact � Office workers in the U.S. spend at least 25% of the day on email – not counting handheld use 3

Hard to manage Hard to manage People get overwhelmed. People get overwhelmed. � � [ Dabbish & Kraut CSCW-2006] [ Dabbish & Kraut, CSCW-2006] . Costly interruptions [ Belloti et al. HCI-2005] � Serious impacts on work productivity � Increasingly difficult to manage requests, negotiate � shared tasks and keep track of different commitments People make horrible mistakes. � “I accidentally sent that message to the wrong person” acc de a y se a essage o e o g pe so � “Oops, I forgot to CC you his final offer” � “Oops, Did I just hit reply-to-all?” � 4

Outline Outline Motivation Motivation 1 1. Email Speech Acts 2. Modeling textual intention in email messages Modeling textual intention in email messages � � Intelligent Email Addressing 3. Preventing information leaks Preventing information leaks � � Ranking potential recipients � Cut Once – a Mozilla Thunderbird extension � Fine-tuning Ranking Models 4. Ranking in two optimization steps � 5

Example From: Benjamin Han Request - Information Request I f i To: Vitor Carvalho Subject: LTI Student Research Symposium Reminder - Action/Task Hey Vitor Hey Vitor � Prioritize email by “intention” When exactly is the LTI SRS submission deadline? � Help keep track of your tasks: p p y Also, don’t forget to ask pending requests, � Eric about the SRS commitments, reminders, webpage. answers, etc. , Thanks. � Better integration with to-do Ben lists 6

Add Task: follow up on: “request for screen shots” by ___ days before -? 2 “next Wed” (12/5/07) “end of the week” (11/30/07) “Sunday” (12/2/07) - other - Request Request Time/date 7

Classifying Email into Acts [Cohen, Carvalho & Mitchell, EMNLP- [Cohen, Carvalho & Mitchell, EMNLP -04] 04] An Act is described as a verb- � Verb Verb Verbs noun pair (e.g., propose ( g meeting, request information) - Commisive Commisive Directive Directive Not all pairs make sense Request Request Request Request Commit Commit Commit Commit Propose Propose Propose Propose D li Deliver D li Deliver One single email message � may contain multiple acts Amend Amend Noun Noun T Try to describe commonly t d ib l � observed behaviors, rather Delivery Delivery than all possible speech acts Activity Activity in English g s Data Data Ongoing Ongoing Opinion Opinion Event Event Also include non-linguistic � Nouns Nouns usage of email (e.g. delivery of g ( g y Meeting Meeting Meeting Meeting Other Other files) 8

Data & Features � Data: Carnegie Mellon MBA students competition � Semester-long project for CMU MBA students. Total of 277 Se este o g p oject o C U stude ts ota o students, divided in 50 teams (4 to 6 students/team). Rich in task negotiation. � 1700+ messages (from 5 teams) were manually labeled. One of the teams was double labeled, and the inter-annotator agreement the teams was double labeled and the inter-annotator agreement ranges from 0.72 to 0.83 (Kappa) for the most frequent acts. � Features: – N-grams: 1-gram, 2-gram, 3-gram,4-gram and 5-gram – Pre-Processing � Remove Signature files, quoted lines (in-reply-to) [Jangada package] � Entity normalization and substitution patterns: � “Sunday”…”Monday” → [day], [number]:[number] → [hour], � “me, her, him ,us or them” → [me], � “after, before, or during” → [time], etc after, before, or during → [time], etc � 9

Error Rate for Various Acts [ Carvalho & Cohen, HLT-ACTS-06] [ Cohen, Carvalho & Mitchell, EMNLP-04] 1g (1716 msgs) 1g+2g+3g+PreProcess 1 0.9 0.8 cision 0.7 Prec 0.6 0.5 0 4 0.4 0.3 0 0.2 0.4 0.6 0.8 1 Recall 5-fold cross-validation over 1716 emails, SVM with linear kernel 10

Best features (selected by Information Gain) (selected by Information Gain) Ciranda : Java package for Email Speech Act Classification 11

Idea: Predicting Acts from Surrounding Acts [ Carvalho & Cohen, SIGIR-05] Example of Email Thread Sequence Strong correlation between Strong correlation between previous and next message’s Deliver acts Request Request Request Propose Propose Deliver Commit Act has little or no Commit correlation with other acts of same message of same message Deliver Both Context and Content have C Commit predictive value for email act classification Context: Collective classification problem 12

Collective Classification with Dependency Networks (DN) Networks (DN) [ Carvalho & Cohen, SIGIR-05] • In DNs, the full joint j Commit probability distribution is … … approximated with a set of … conditional distributions that can be learned Request Request independently. The conditional probabilities are calculated for each node Deliver given its Markov blanket . Current r Parent Child ∏ = Pr( ) Pr( | ( ) ) X X Blanket X Message Message Message i i i [ Heckerman et al., JMLR-00] Inference: Temperature-driven Gibbs [ Neville & Jensen, JMLR-07] sampling li 13

Act by Act Comparative Results y p Modest im provem ents Baseline Collective over the baseline 43 44 43.44 dD t dData 44.98 Only on acts related to 38.69 Deliver negotiation: Request, 42.01 Com m it, Propose, 40.72 Propose 36.84 Meet, Com m issive, etc. Meet, Com m issive, etc. 49.55 Request 47.25 58.37 Directive 58.27 52.42 Meeting Meeting 47 81 47.81 32.77 Commit 30.74 42.55 Commissive 37.66 0 10 20 30 40 50 60 70 “Sparse” links Kappa Values (%) Kappa values with and without collective classification, averaged over four team test sets in the leave-one-team out experiment. 14

Applications of Email Acts • Iterative Learning of Email Tasks and Email Acts [Kushmerick & Khousainov, IJCAI-05] • Predicting Social Roles and Group Leadership [Leusky,SIGIR-04][Carvalho,Wu & Cohen, CEAS-07] • Detecting Focus on Threaded Discussions g [Feng et al., HLT/NAACL-06] • A t Automatically Classifying Emails into Activities ti ll Cl if i E il i t A ti iti [Dredze, Lau & Kushmerick, IUI-06] 15

Outline Motivation √ Motivation √ 1. 1 Email Speech Acts √ 2. Modeling textual intention in email messages Modeling textual intention in email messages � � Intelligent Email Addressing 3. Preventing information leaks Preventing information leaks � � Ranking potential recipients � Fine-tuning User Ranking Models g g 4. Ranking in two optimization steps � 16

http://www.sophos.com/ 20

Preventing Email Info Leaks g [ Carvalho & Cohen, SDM-07] Email Leak: email accidentally No labeled data No labeled data sent to wrong person sent to wrong person • Who would give me this kind of data? 1. Similar first or last names, aliases, etc 2. Aggressive auto- completion of email addresses Disastrous consequences: 3. Typos expensive law suits, brand 4 4. Keyboard settings Keyboard settings reputation damage negotiation reputation damage, negotiation setbacks, etc. 21

Preventing Email Info Leaks g [ Carvalho & Cohen, SDM-07] • Method 1 1. Create simulated/artificial email Create simulated/artificial email recipients 2. 2. Build model for (msg.recipients): Build model for (msg.recipients): train classifier on real data to detect synthetically created outliers (added to the true recipient list). 1. Similar first or last • Features : textual(subject body) Features : textual(subject, body), names, aliases, etc network features (frequencies, co- occurrences, etc). 2. Aggressive auto- completion of email 3. 3 Detect outlier and warn user based Detect outlier and warn user based addresses on confidence. 3. Typos 4 4. Keyboard settings Keyboard settings 22

Outline Outline Motivation Motivation 1 1. Email Speech Acts - PowerPoint PPT Presentation

Modeling Intention in Email : Speech Acts, Information Leaks and User Ranking Methods p g Vitor R. Carvalho Carnegie Mellon University William Cohen Ramnath a at Tom Tom Jon Jon Balasubramanyan Mitchell Elsas Outline Outline

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Session 7: Attribution In a pastoralist area, an NGO implements a community-based animal

Timeliness and the Art of Disbursement 2012 HUD CDBG Disaster Recovery Training Feb 13, 2012

Latin America and Citizen Cyberscience Francisco Fernndez de Vega Universidad de Extremadura,

CS 514: Computer Networks Lecture 9: Global Routing Xiaowei Yang xwy@cs.duke.edu Overview

05 Errors and Power.notebook November 29, 2012 10.4 Inference as Decision Tests of significance

Jeffrey D. Ullman Stanford University A large set of items , e.g., things sold in a

Math 1710 Class 26 Inference Coffee Machine Dr. Allen Back Using Table T t-CIs and HTs

Data Mining and Exploration Data Mining and Exploration: Introduction Course Introduction Amos

Sambuz

Useful Links

Newsletter

Mail Us