outline outline
play

Outline Outline Motivation Motivation 1 1. Email Speech Acts - PowerPoint PPT Presentation

Modeling Intention in Email : Speech Acts, Information Leaks and User Ranking Methods p g Vitor R. Carvalho Carnegie Mellon University William Cohen Ramnath a at Tom Tom Jon Jon Balasubramanyan Mitchell Elsas Outline Outline


  1. Modeling Intention in Email : Speech Acts, Information Leaks and User Ranking Methods p g Vitor R. Carvalho Carnegie Mellon University William Cohen Ramnath a at Tom Tom Jon Jon Balasubramanyan Mitchell Elsas

  2. Outline Outline Motivation Motivation 1 1. Email Speech Acts 2. Modeling textual intention in email messages Modeling textual intention in email messages � � Intelligent Email Addressing 3. Preventing information leaks Preventing information leaks � � Ranking potential recipients � Cut Once – a Mozilla Thunderbird extension � Fine-tuning Ranking Models 4. Ranking in two optimization steps � 2

  3. Why Email Why Email � The most successful e-communication application. � Great tool to collaborate, especially in different time zones. � Very cheap, fast, convenient and robust. It just works. � Increasingly popular [ Shipley & Schwalbe, 2007] � Clinton adm. left 32 million emails to the National Archives � Bush adm….more than 100 million in 2009 (expected) � Visible impact Visible impact � Office workers in the U.S. spend at least 25% of the day on email – not counting handheld use 3

  4. Hard to manage Hard to manage People get overwhelmed. People get overwhelmed. � � [ Dabbish & Kraut CSCW-2006] [ Dabbish & Kraut, CSCW-2006] . Costly interruptions [ Belloti et al. HCI-2005] � Serious impacts on work productivity � Increasingly difficult to manage requests, negotiate � shared tasks and keep track of different commitments People make horrible mistakes. � “I accidentally sent that message to the wrong person” acc de a y se a essage o e o g pe so � “Oops, I forgot to CC you his final offer” � “Oops, Did I just hit reply-to-all?” � 4

  5. Outline Outline Motivation Motivation 1 1. Email Speech Acts 2. Modeling textual intention in email messages Modeling textual intention in email messages � � Intelligent Email Addressing 3. Preventing information leaks Preventing information leaks � � Ranking potential recipients � Cut Once – a Mozilla Thunderbird extension � Fine-tuning Ranking Models 4. Ranking in two optimization steps � 5

  6. Example From: Benjamin Han Request - Information Request I f i To: Vitor Carvalho Subject: LTI Student Research Symposium Reminder - Action/Task Hey Vitor Hey Vitor � Prioritize email by “intention” When exactly is the LTI SRS submission deadline? � Help keep track of your tasks: p p y Also, don’t forget to ask pending requests, � Eric about the SRS commitments, reminders, webpage. answers, etc. , Thanks. � Better integration with to-do Ben lists 6

  7. Add Task: follow up on: “request for screen shots” by ___ days before -? 2 “next Wed” (12/5/07) “end of the week” (11/30/07) “Sunday” (12/2/07) - other - Request Request Time/date 7

  8. Classifying Email into Acts [Cohen, Carvalho & Mitchell, EMNLP- [Cohen, Carvalho & Mitchell, EMNLP -04] 04] An Act is described as a verb- � Verb Verb Verbs noun pair (e.g., propose ( g meeting, request information) - Commisive Commisive Directive Directive Not all pairs make sense Request Request Request Request Commit Commit Commit Commit Propose Propose Propose Propose D li Deliver D li Deliver One single email message � may contain multiple acts Amend Amend Noun Noun T Try to describe commonly t d ib l � observed behaviors, rather Delivery Delivery than all possible speech acts Activity Activity in English g s Data Data Ongoing Ongoing Opinion Opinion Event Event Also include non-linguistic � Nouns Nouns usage of email (e.g. delivery of g ( g y Meeting Meeting Meeting Meeting Other Other files) 8

  9. Data & Features � Data: Carnegie Mellon MBA students competition � Semester-long project for CMU MBA students. Total of 277 Se este o g p oject o C U stude ts ota o students, divided in 50 teams (4 to 6 students/team). Rich in task negotiation. � 1700+ messages (from 5 teams) were manually labeled. One of the teams was double labeled, and the inter-annotator agreement the teams was double labeled and the inter-annotator agreement ranges from 0.72 to 0.83 (Kappa) for the most frequent acts. � Features: – N-grams: 1-gram, 2-gram, 3-gram,4-gram and 5-gram – Pre-Processing � Remove Signature files, quoted lines (in-reply-to) [Jangada package] � Entity normalization and substitution patterns: � “Sunday”…”Monday” → [day], [number]:[number] → [hour], � “me, her, him ,us or them” → [me], � “after, before, or during” → [time], etc after, before, or during → [time], etc � 9

  10. Error Rate for Various Acts [ Carvalho & Cohen, HLT-ACTS-06] [ Cohen, Carvalho & Mitchell, EMNLP-04] 1g (1716 msgs) 1g+2g+3g+PreProcess 1 0.9 0.8 cision 0.7 Prec 0.6 0.5 0 4 0.4 0.3 0 0.2 0.4 0.6 0.8 1 Recall 5-fold cross-validation over 1716 emails, SVM with linear kernel 10

  11. Best features (selected by Information Gain) (selected by Information Gain) Ciranda : Java package for Email Speech Act Classification 11

  12. Idea: Predicting Acts from Surrounding Acts [ Carvalho & Cohen, SIGIR-05] Example of Email Thread Sequence Strong correlation between Strong correlation between previous and next message’s Deliver acts Request Request Request Propose Propose Deliver Commit Act has little or no Commit correlation with other acts of same message of same message Deliver Both Context and Content have C Commit predictive value for email act classification Context: Collective classification problem 12

  13. Collective Classification with Dependency Networks (DN) Networks (DN) [ Carvalho & Cohen, SIGIR-05] • In DNs, the full joint j Commit probability distribution is … … approximated with a set of … conditional distributions that can be learned Request Request independently. The conditional probabilities are calculated for each node Deliver given its Markov blanket . Current r Parent Child ∏ = Pr( ) Pr( | ( ) ) X X Blanket X Message Message Message i i i [ Heckerman et al., JMLR-00] Inference: Temperature-driven Gibbs [ Neville & Jensen, JMLR-07] sampling li 13

  14. Act by Act Comparative Results y p Modest im provem ents Baseline Collective over the baseline 43 44 43.44 dD t dData 44.98 Only on acts related to 38.69 Deliver negotiation: Request, 42.01 Com m it, Propose, 40.72 Propose 36.84 Meet, Com m issive, etc. Meet, Com m issive, etc. 49.55 Request 47.25 58.37 Directive 58.27 52.42 Meeting Meeting 47 81 47.81 32.77 Commit 30.74 42.55 Commissive 37.66 0 10 20 30 40 50 60 70 “Sparse” links Kappa Values (%) Kappa values with and without collective classification, averaged over four team test sets in the leave-one-team out experiment. 14

  15. Applications of Email Acts • Iterative Learning of Email Tasks and Email Acts [Kushmerick & Khousainov, IJCAI-05] • Predicting Social Roles and Group Leadership [Leusky,SIGIR-04][Carvalho,Wu & Cohen, CEAS-07] • Detecting Focus on Threaded Discussions g [Feng et al., HLT/NAACL-06] • A t Automatically Classifying Emails into Activities ti ll Cl if i E il i t A ti iti [Dredze, Lau & Kushmerick, IUI-06] 15

  16. Outline Motivation √ Motivation √ 1. 1 Email Speech Acts √ 2. Modeling textual intention in email messages Modeling textual intention in email messages � � Intelligent Email Addressing 3. Preventing information leaks Preventing information leaks � � Ranking potential recipients � Fine-tuning User Ranking Models g g 4. Ranking in two optimization steps � 16

  17. 17

  18. 18

  19. 19

  20. http://www.sophos.com/ 20

  21. Preventing Email Info Leaks g [ Carvalho & Cohen, SDM-07] Email Leak: email accidentally No labeled data No labeled data sent to wrong person sent to wrong person • Who would give me this kind of data? 1. Similar first or last names, aliases, etc 2. Aggressive auto- completion of email addresses Disastrous consequences: 3. Typos expensive law suits, brand 4 4. Keyboard settings Keyboard settings reputation damage negotiation reputation damage, negotiation setbacks, etc. 21

  22. Preventing Email Info Leaks g [ Carvalho & Cohen, SDM-07] • Method 1 1. Create simulated/artificial email Create simulated/artificial email recipients 2. 2. Build model for (msg.recipients): Build model for (msg.recipients): train classifier on real data to detect synthetically created outliers (added to the true recipient list). 1. Similar first or last • Features : textual(subject body) Features : textual(subject, body), names, aliases, etc network features (frequencies, co- occurrences, etc). 2. Aggressive auto- completion of email 3. 3 Detect outlier and warn user based Detect outlier and warn user based addresses on confidence. 3. Typos 4 4. Keyboard settings Keyboard settings 22

Recommend


More recommend