Outline Outline Motivation Motivation 1 1. Email Speech Acts - - PowerPoint PPT Presentation

outline outline
SMART_READER_LITE
LIVE PREVIEW

Outline Outline Motivation Motivation 1 1. Email Speech Acts - - PowerPoint PPT Presentation

Modeling Intention in Email : Speech Acts, Information Leaks and User Ranking Methods p g Vitor R. Carvalho Carnegie Mellon University William Cohen Ramnath a at Tom Tom Jon Jon Balasubramanyan Mitchell Elsas Outline Outline


slide-1
SLIDE 1

Modeling Intention in Email:

Speech Acts, Information Leaks and User Ranking Methods p g

Vitor R. Carvalho

Carnegie Mellon University

William Cohen

Tom Jon Ramnath Tom Mitchell Jon Elsas a at Balasubramanyan

slide-2
SLIDE 2

Outline Outline

1

Motivation

1.

Motivation

2.

Email Speech Acts

  • Modeling textual intention in email messages
  • Modeling textual intention in email messages

3.

Intelligent Email Addressing

  • Preventing information leaks
  • Preventing information leaks
  • Ranking potential recipients
  • Cut Once – a Mozilla Thunderbird extension

4.

Fine-tuning Ranking Models

  • Ranking in two optimization steps

2

slide-3
SLIDE 3

Why Email Why Email

The most successful e-communication application.

Great tool to collaborate, especially in different time zones. Very cheap, fast, convenient and robust. It just works.

Increasingly popular

Clinton adm. left 32 million emails to the National Archives

[ Shipley & Schwalbe, 2007]

Bush adm….more than 100 million in 2009 (expected)

Visible impact

Visible impact

Office workers in the U.S. spend at least 25% of the day on email –

not counting handheld use

3

slide-4
SLIDE 4

Hard to manage Hard to manage

  • People get overwhelmed.

[ Dabbish & Kraut CSCW-2006]

  • People get overwhelmed.
  • Costly interruptions
  • Serious impacts on work productivity

[ Dabbish & Kraut, CSCW-2006] . [ Belloti et al. HCI-2005]

  • Increasingly difficult to manage requests, negotiate

shared tasks and keep track of different commitments

  • People make horrible mistakes.
  • “I accidentally sent that message to the wrong person”

acc de a y se a essage o e

  • g pe so
  • “Oops, I forgot to CC you his final offer”
  • “Oops, Did I just hit reply-to-all?”

4

slide-5
SLIDE 5

Outline Outline

1

Motivation

1.

Motivation

2.

Email Speech Acts

  • Modeling textual intention in email messages
  • Modeling textual intention in email messages

3.

Intelligent Email Addressing

  • Preventing information leaks
  • Preventing information leaks
  • Ranking potential recipients
  • Cut Once – a Mozilla Thunderbird extension

4.

Fine-tuning Ranking Models

  • Ranking in two optimization steps

5

slide-6
SLIDE 6

Example

From: Benjamin Han

Request

I f i

To: Vitor Carvalho Subject: LTI Student Research Symposium

Hey Vitor

Request - Information Reminder - Action/Task

Hey Vitor When exactly is the LTI SRS submission deadline?

Prioritize email by “intention” Help keep track of your tasks:

Also, don’t forget to ask Eric about the SRS webpage.

p p y

  • pending requests,

commitments, reminders, answers, etc. Thanks. Ben ,

Better integration with to-do

lists

6

slide-7
SLIDE 7

Add Task: follow up on: “request for screen shots” by ___ days before -?

2 “next Wed” (12/5/07) “end of the week” (11/30/07) “Sunday” (12/2/07)

  • other -

Request Request Time/date 7

slide-8
SLIDE 8

Classifying Email into Acts

[Cohen, Carvalho & Mitchell, EMNLP [Cohen, Carvalho & Mitchell, EMNLP-

  • 04]

04]

Verb Verb

Verbs

  • An Act is described as a verb-

noun pair (e.g., propose

Commisive Directive D li Commit Request Propose Commisive Directive D li Commit Request Propose

( g meeting, request information) - Not all pairs make sense

Deliver Commit Request Propose Amend Noun Deliver Commit Request Propose Amend Noun

  • One single email message

may contain multiple acts T t d ib l

Activity Delivery Activity Delivery

  • Try to describe commonly
  • bserved behaviors, rather

than all possible speech acts in English

Ongoing Event Meeting Opinion Data Ongoing Event Meeting Opinion Data

Nouns

g s

  • Also include non-linguistic

usage of email (e.g. delivery of

8

Meeting Other Meeting Other

Nouns

g ( g y files)

slide-9
SLIDE 9

Data & Features

Data: Carnegie Mellon MBA students competition

Semester-long project for CMU MBA students. Total of 277

Se este

  • g p oject o C

U stude ts

  • ta o

students, divided in 50 teams (4 to 6 students/team). Rich in task negotiation.

1700+ messages (from 5 teams) were manually labeled. One of

the teams was double labeled and the inter-annotator agreement the teams was double labeled, and the inter-annotator agreement ranges from 0.72 to 0.83 (Kappa) for the most frequent acts.

Features:

– N-grams: 1-gram, 2-gram, 3-gram,4-gram and 5-gram – Pre-Processing

Remove Signature files, quoted lines (in-reply-to) [Jangada package] Entity normalization and substitution patterns:

“Sunday”…”Monday” →[day], [number]:[number] → [hour], “me, her, him ,us or them” → [me], “after, before, or during” → [time], etc

9

  • after, before, or during → [time], etc
slide-10
SLIDE 10

Error Rate for Various Acts

[ Carvalho & Cohen, HLT-ACTS-06] [ Cohen, Carvalho & Mitchell, EMNLP-04]

0.9 1

1g (1716 msgs) 1g+2g+3g+PreProcess

0.7 0.8 cision 0 4 0.5 0.6 Prec 0.3 0.4 0.2 0.4 0.6 0.8 1 Recall

10

5-fold cross-validation over 1716 emails, SVM with linear kernel

slide-11
SLIDE 11

Best features

(selected by Information Gain) (selected by Information Gain)

Ciranda:

Java package for Email Speech Act Classification

11

slide-12
SLIDE 12

Idea: Predicting Acts from Surrounding Acts

Strong correlation between Example of Email Thread Sequence

[ Carvalho & Cohen, SIGIR-05]

Request Propose Request Deliver

Strong correlation between previous and next message’s acts

Deliver Request Commit Propose Commit

Act has little or no correlation with other acts

  • f same message

C Deliver

  • f same message

Both Context and Content have

Commit

predictive value for email act classification

12

Context: Collective classification problem

slide-13
SLIDE 13

Collective Classification with Dependency Networks (DN) Networks (DN)

  • In DNs, the full joint

Commit [ Carvalho & Cohen, SIGIR-05]

j probability distribution is approximated with a set of conditional distributions that can be learned

Request

… … …

  • independently. The

conditional probabilities are calculated for each node given its Markov blanket.

Request Deliver

) ) ( | Pr( ) Pr(

=

i i

X Blanket X X r

Parent Message Child Message Current Message

i

[ Heckerman et al., JMLR-00] [ Neville & Jensen, JMLR-07]

Inference: Temperature-driven Gibbs li

13

sampling

slide-14
SLIDE 14

Act by Act Comparative Results y p

43 44

dD t Baseline Collective

Modest im provem ents

  • ver the baseline

36.84 42.01 44.98 40.72 38.69 43.44

Propose Deliver dData Only on acts related to negotiation: Request, Com m it, Propose, Meet, Com m issive, etc.

47 81 58.27 47.25 52.42 58.37 49.55

Meeting Directive Request Meet, Com m issive, etc.

37.66 30.74 47.81 42.55 32.77

Commissive Commit Meeting 10 20 30 40 50 60 70 Kappa Values (%)

“Sparse” links

14

Kappa values with and without collective classification, averaged over four team test sets in the leave-one-team out experiment.

slide-15
SLIDE 15

Applications of Email Acts

  • Iterative Learning of Email Tasks and Email Acts

[Kushmerick & Khousainov, IJCAI-05]

  • Predicting Social Roles and Group Leadership
  • Detecting Focus on Threaded Discussions

[Leusky,SIGIR-04][Carvalho,Wu & Cohen, CEAS-07]

g A t ti ll Cl if i E il i t A ti iti

[Feng et al., HLT/NAACL-06]

  • Automatically Classifying Emails into Activities

[Dredze, Lau & Kushmerick, IUI-06]

15

slide-16
SLIDE 16

Outline

1

Motivation √

1.

Motivation √

2.

Email Speech Acts √

  • Modeling textual intention in email messages
  • Modeling textual intention in email messages

3.

Intelligent Email Addressing

  • Preventing information leaks
  • Preventing information leaks
  • Ranking potential recipients

4.

Fine-tuning User Ranking Models g g

  • Ranking in two optimization steps

16

slide-17
SLIDE 17

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

http://www.sophos.com/ 20

slide-21
SLIDE 21

Preventing Email Info Leaks g

Email Leak: email accidentally sent to wrong person

[ Carvalho & Cohen, SDM-07]

No labeled data sent to wrong person No labeled data

  • Who would give me

this kind of data?

  • 1. Similar first or last

names, aliases, etc

  • 2. Aggressive auto-

completion of email addresses

  • 3. Typos

4 Keyboard settings Disastrous consequences: expensive law suits, brand reputation damage negotiation

21

  • 4. Keyboard settings

reputation damage, negotiation setbacks, etc.

slide-22
SLIDE 22

Preventing Email Info Leaks g

[ Carvalho & Cohen, SDM-07]

  • Method

1 Create simulated/artificial email 1. Create simulated/artificial email recipients

2. Build model for (msg.recipients):

  • 1. Similar first or last

2. Build model for (msg.recipients): train classifier on real data to detect synthetically created outliers (added to the true recipient list).

  • Features: textual(subject body)

names, aliases, etc

  • 2. Aggressive auto-

completion of email

Features: textual(subject, body), network features (frequencies, co-

  • ccurrences, etc).

3 Detect outlier and warn user based addresses

  • 3. Typos

4 Keyboard settings 3. Detect outlier and warn user based

  • n confidence.

22

  • 4. Keyboard settings
slide-23
SLIDE 23

Simulating Email Leaks g

  • Several options:

– Frequent typos, same/similar last names, identical/similar first names, aggressive auto-completion of addresses, etc.

  • We adopted the 3g-address criteria:

– On each trial, one of the msg recipients is randomly chosen d tli i t d di t

α 1−α

and an outlier is generated according to:

3 2 Marina.wang @enron.com Generate a random email Else: Randomly select an address 2 1 address NOT in Address Book

23

Else: Randomly select an address book entry

slide-24
SLIDE 24

Data and Baselines Data and Baselines

  • Enron email dataset,

Enron email dataset, with a realistic setting

– For each user, ~10% most t t recent sent messages were used as test – Some basic preprocessing

  • Rocchio/TFIDF Centroid [1971]
  • Baseline methods:

Textual similarity

  • Rocchio/TFIDF Centroid [1971]

Create a “TfIdf centroid” for each user in Address Book. For testing, rank according to cosine similarity between test msg and each centroid

– Textual similarity – Common baselines in IR

each centroid.

  • Knn-30 [Yang & Chute, 1994]

Given a test msg, get 30 most similar msgs in training set. Rank according to “sum of similarities” of a given user on the 30-msg

24

similarities of a given user on the 30-msg set.

slide-25
SLIDE 25

Enron Data Preprocessing

  • a a

ep ocess g

  • ISI version of Enron

– Remove repeated messages and inconsistencies

  • Disambiguate Main Enron addresses

– List provided by Corrada-Emmanuel from UMass

  • Bag-of-words

– Messages were represented as the union of BOW of Messages were represented as the union of BOW of body and BOW of subject

  • Some stop words removed

S lf dd d d

  • Self-addressed messages were removed

25

slide-26
SLIDE 26

Leak Results 1 Leak Results 1

0.6

Rocc

0.45 0.5 0.55

Accuracy

0.35 0.4 Random Rocchio KNN-30

A

Average Accuracy in 10 trials: On each trial, a different set

  • f outliers is generated

26

slide-27
SLIDE 27

Using Network Features Using Network Features

1. Frequency features

– Number of received, sent and sent+received messages (from this user)

2. Co-Occurrence Features

– Number of times a user co-

  • ccurred with all other recipients.

3. Auto features

– For each recipient R find Rm For each recipient R, find Rm (=address with max score from 3g-address list of R), then use score(R)-score(Rm) as feature.

27

slide-28
SLIDE 28

Using Network Features Using Network Features

1. Frequency features

– Number of received, sent and sent+received messages (from this user)

2. Co-Occurrence Features

Combine with text-only scores using perceptron-based reranking,

– Number of times a user co-

  • ccurred with all other recipients.

3. Auto features

– For each recipient R find Rm

g p p g, trained on simulated leaks

For each recipient R, find Rm (=address with max score from 3g-address list of R), then use score(R)-score(Rm) as feature.

b d Network Features Text-based Feature

28

slide-29
SLIDE 29

Email Leak Results Email Leak Results

[ Carvalho & Cohen, SDM-07]

0.9

0.804 0.748 0.718 0.814

0.8

Random TfIdf Knn30 Knn30+Frequency

0.558 0.56

0.6 0.7

curacy .

Knn30+Frequency Knn30+Cooccur1 Knn30+Cooccur_to Knn30+All Above

0.406

0 4 0.5

Acc

_

0.3 0.4 L k P di ti

29

Leak Prediction

slide-30
SLIDE 30

Finding Real Leaks in Enron

  • How can we find it?

– Grep for “mistake”, “sorry” or “accident”

“Sorry. Sent this to you by m istake.”, “I accidentally sent you this rem inder”

– Note: must be from one of the Enron users

F d 2 d

you this rem inder

  • Found 2 good cases:
  • 1. Message germany-c/sent/930, message has 20 recipients, leak is

alex.perkins@ 2 kitchen l/sent items/497 it has 44 recipients leak is rita wynne@

  • 2. kitchen-l/sent items/497, it has 44 recipients, leak is rita.wynne@
  • Prediction results:
  • Prediction results:

– The proposed algorithm was able to find these two leaks

30

slide-31
SLIDE 31

Not the

  • nly problem
  • nly problem

when addressing emails…

31

slide-32
SLIDE 32

Sometimes people just… forget an intended recipient forget an intended recipient

  • Particularly in large organizations, it is not uncommon to forget to

CC an important collaborator: a manager a colleague a contractor CC an important collaborator: a manager, a colleague, a contractor, an intern, etc.

M f t th t d (f E C ll ti )

[ Carvalho & Cohen, ECIR-2008] More frequent than expected (from Enron Collection)

– at least 9.27% of the users have forgotten to add a desired email recipient. At l t 20 52% f th t i l d d – At least 20.52% of the users were not included as recipients (even though they were intended recipients) in at least one received message.

  • Cost of errors in task management can be high:

Communication delays, Deadlines can be missed

32

Opportunities wasted, Costly misunderstandings, Task delays

slide-33
SLIDE 33

Data and Features

  • Two Ranking problems:
  • Predicting TO+CC+BCC

P di ti CC BCC

  • Predicting CC+BCC
  • Easy to obtain labeled data
  • Features
  • Textual: Rocchio (TfIdf) and KNN
  • Network (from Email Headers)
  • Frequency
  • # messages received and/or sent (from/to this user)
  • Recency
  • Recency
  • How often was a particular user addressed in the last 100 msgs
  • Co-Occurrence
  • Number of times a user co-occurred with all other recipients. Co-occurr

means “two recipients were addressed in the same message in the training ”

33

set”

slide-34
SLIDE 34

Email Recipient Recommendation

0.5

36 Enron users

0.4 0.45

Frequency

users

0.3 0.35

MAP

Recency TFIDF KNN

0.2 0.25

Perceptron

0.15

TOCCBCC CCBCC

34 [Carvalho & Cohen, ECIR-08] 44000+ queries Avg: ~ 1267 q/ user

slide-35
SLIDE 35

Rank Aggregation (Data Fusion) gg g ( )

Ranking combined by

= d RR 1 ) (

Reciprocal Rank:

=

Rankings q i q i

d rank d RR ) ( ) (

35

slide-36
SLIDE 36

Rank Aggregation Results

0.5 0.55

Freq Rec

0 4 0.45

M1-uc M2-uc TFIDF

0.35 0.4

MAP

KNN Fusion

0.25 0.3 0.15 0.2 36

TOCCBCC (thread) CCBCC (thread)

slide-37
SLIDE 37

Intelligent Email Auto-completion

TOCCBCC

[Carvalho & Cohen, ECIR-08]

CCBCC

37

slide-38
SLIDE 38

Intelligent Email Auto-completion

38

slide-39
SLIDE 39

Mozilla Thunderbird plug-in (Cut Once)

Leak warnings: hit x to remove recipient

Suggestions: hit + to add

Timer: msg is sent after 10sec by default 39

slide-40
SLIDE 40

Mozilla Thunderbird extension (Cut Once) Interested? Just google:

  • Interested? Just google:

“mozilla extension carnegie mellon” “email leak carnegie mellon” g

  • From Mozilla website: 64 active daily users.
  • User Study using Cut Once

– Strong TFIDF preference g p – write-then-address behavior (instead of address-then-

write)

40

slide-41
SLIDE 41

Outline

1

Motivation √

1.

Motivation √

2.

Email Speech Acts √

  • Modeling textual intention in email messages

3.

Intelligent Email Addressing √ g g

  • Preventing information leaks
  • Ranking potential recipients

4.

Fine-tuning User Ranking Models

  • Ranking in two optimization steps

41

slide-42
SLIDE 42

Email Recipient Recommendation

0.5

36 Enron users

0.4 0.45

Frequency

users

0.3 0.35

MAP

Recency TFIDF KNN

0.2 0.25

Perceptron

0.15

TOCCBCC CCBCC

42

slide-43
SLIDE 43

Learning to Rank

  • Can we do better ranking?

– Learning to Rank: machine learning to improve ranking Recently proposed feature based ranking methods: – Recently proposed feature-based ranking methods:

  • RankSVM
  • ListNet

[Joachims, KDD-02] [Cao et al. , ICML-07]

  • RankBoost
  • Perceptron Variations

– Online, scalable.

[Elsas, Carvalho & Carbonell, WSDM-08] [Freund et al, 2003]

,

– Learning to rank in 2 optimization steps

43

  • Pairwise-based ranking framework (like many of the above)
slide-44
SLIDE 44

Pairwise-based Ranking

R k

Goal: induce a ranking function f(d) s.t.

) ( ) (

j i j i

d f d f d d > ⇔ f

Rank q d1

Goal: induce a ranking function f(d) s.t.

d2 d3 d4

We assume a linear function f

mi m i i i i

x w x w x w d d f + + + = = ... , w ) (

2 2 1 1

d4 d5 d6

Constraints:

) ,..., , (

6 26 16 m

x x x =

... dT

, > − ⇔

j i j i

d d w d d f

Constraints:

44

slide-45
SLIDE 45

Pairwise-based Ranking

  • Advantages
  • 1. Most classification methods can be easily adapted to

the ranking problem 2 This framework can be generalized to any graded

  • 2. This framework can be generalized to any graded

relevance levels (e.g. definitely relevant, somewhat relevant, non-relevant).

  • 3. In many practical scenarios, it is easier to obtain large

amounts of pairwise preference data [Joachims:2002] 4 Al th i id th t i i f

  • 4. Also, there is evidence that pairwise preference

judgment is easier for accessors [Carterette, 2008].

45

slide-46
SLIDE 46

Pairwise-based Ranking

  • Disadvantages

– One single human labeling error creates many outliers

  • since pairs of documents of different labels are used

as instances in the learning scheme. as instances in the learning scheme. – Discrimination of multi-level labeling scheme (1-2, 2-3, versus 1-5) I l l b l d ki d t t f th – In real labeled ranking datasets, many of the documents are unjudged and typically considered non-relevant for pairwise learning algorithms.

46

slide-47
SLIDE 47

Method 1: Ranking with Perceptrons g

  • Nice convergence and mistake bounds

Nice convergence and mistake bounds

– bound on the number of misranks

  • Online, fast and scalable
  • Many variants

[El C lh & C b ll 2008] [Collins, 2002; Gao et al, 2005]

  • Many variants

– Voting, averaging, committee, pocket, etc. – General update rule:

[Elsas, Carvalho & Carbonell, 2008]

– Here: Averaged version of perceptron

] [

1 NR R t t

d d W W − + =

+

47

Here: Averaged version of perceptron

slide-48
SLIDE 48

Method 2: Rank SVM

, 2 1 min

2

+ =

i ranksvm

C w L ε

[Joachims, KDD-02], [Herbrich et al, 2000]

2

∈RP i i ranksvm w

)} , {( , 1 , , subject to

i NR R i NR R

d d RP d d w = − ≥ − ≥ ε ε

+ d d L ] 1 [ i

2

λ

1 h λ Equivalent to:

+

− − + =

RP NR R ranksvm w

d d w w L ] , 1 [ min λ

. 2C where , = λ

  • Minimizing number of misranks (hinge loss approx.)
  • Equivalent to maximizing AUC
  • Lowerbound on MAP precision@K MRR etc

48

  • Lowerbound on MAP, precision@K, MRR, etc.

[Elsas, Carvalho & Carbonell, WSDM-08]

slide-49
SLIDE 49

Loss Function Loss Function

1.5 2 1

  • ss

0.5 Lo

  • 0.5
  • 3
  • 2
  • 1

1 2 3

49

NR R

d d w x − = ,

slide-50
SLIDE 50

Loss Function Loss Function

1.5 2 1

  • ss

0.5 L

  • 0.5
  • 3
  • 2
  • 1

1 2 3

50

NR R

d d w x − = ,

slide-51
SLIDE 51

Loss Function Loss Function

) ( 1 1 1 1 1 σ

σ σ σ

x sigmoid e e e

x x x

− = + − = +

− − −

1.5 2

1 1 e e + +

Robust to

  • utliers

1

  • ss
  • utliers

0.5 Lo

  • 0.5
  • 3
  • 2
  • 1

1 2 3

51

NR R

d d w x − = ,

slide-52
SLIDE 52

Fine-tuning Ranking Models g g

Base ranking Base Sigmoid Final model Base ranking model ase Ranker S g

  • d

Rank Non-convex: e.g., Non convex: e.g., RankSVM, Perceptron, ListNet, Minimize (a very close approximation for) the empirical error: number of misranks etc. error: number of misranks Robust to outliers (label noise)

52

slide-53
SLIDE 53

Learning

  • SigmoidRank Loss

x

e x sigmoid

σ −

+ = 1 1 ) (

− − + =

RP NR R k SigmoidRan w

d d w sigmoid w L )] , ( 1 [ min

2

λ

  • Learning with Gradient Descent

) (

) ( ) ( ) ( ) ( ) 1 ( k d kSi i k k k k k

w L w w w w −∇ = ∆ ∆ + =

+

η ) (

d rankSigmoi

w L w ∇ ∆

)] , ( 1 )[ , ( 2 ) (

) ( NR R NR R RP k d rankSigmoi

d d w sigmoid d d w sigmoid w w L − − − − = ∇

∑σ

λ

53

RP

slide-54
SLIDE 54

Email Recipient Results Email Recipient Results

0.55

36 Enron users

p< 0.01 p= 0.06 p< 0.01

0.5

Frequency Recency

12.7% 1.69%

  • 0.09%

13.2% 0.96% 2.07% p= 0.74 p< 0.01 p< 0.01

0.45

MAP

TFIDF KNN Percep Percep+Sigmoid

0 35 0.4

M

Percep+Sigmoid RankSVM RankSVM+Sigmoid Listnet

0.3 0.35

ListNet+sigmoid

54

TOCCBCC CCBCC

44000+ queries Avg: ~ 1267 q/ user

slide-55
SLIDE 55

Email Recipient Results Email Recipient Results

36 Enron users

1 0.9 0.95

26.7% 1.22% 0.67% 15.9% 0.77% 4.51% p< 0.01 p< 0.01 p= 0.55 p< 0.01 p< 0.01 p< 0.01

0.8 0.85

AUC

Percep

0.7 0.75

A

Percep Percep+Sigmoid RankSVM RankSVM+Sigmoid

0.6 0.65

TOCCBCC CCBCC

Listnet ListNet+sigmoid

55

TOCCBCC CCBCC

slide-56
SLIDE 56

Email Recipient Results Email Recipient Results

36 Enron users

0 475 0.45 0.475

Percep Percep+Sigmoid RankSVM R kSVM+Si id

12.9% 2.68% 1.53%

0.425

ecision

RankSVM+Sigmoid Listnet ListNet+sigmoid

8 71% 1 78% 0 2%

0 375 0.4

R-Pre

8.71% 1.78%

  • 0.2%

0.35 0.375

TOCCBCC CCBCC

56

TOCCBCC CCBCC

slide-57
SLIDE 57

Email Recipient Results Email Recipient Results

0.9 0 9

36 Enron users

0 9 0.7 0.8 0.9

moid

0.7 0.8 0.9

  • id

MAP values CCBCC task

0 7 0.8 0.9

d

0.5 0.6

ptron+Sigm

0.5 0.6 0.7

VM+Sigmo

0.5 0.6 0.7

et+Sigmoid

0.3 0.4

Percep

0.3 0.4 0.5

RankS

0.3 0.4 0.5

ListNe

0.2 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Perceptron

0.2 0.3 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

RankSVM

0.2 0.3 0.2 0.4 0.6 0.8

Li tN t

57

p

RankSVM ListNet

slide-58
SLIDE 58

Set Expansion (SEAL) Results Set Expansion (SEAL) Results

0.94

Percep

[Wang & Cohen, ICDM-2007]

0 9 0.92

Percep Percep+Sigmoid RankSVM RankSVM+Sigmoid

2.68% 0.11% 3.20% 1.94% 0.44% 1.81%

0.88 0.9

MAP

ListNet ListNet+Sigmoid

1.76% 0.46% 3.22%

0.84 0.86

M

0.8 0.82

SEAL 1 SEAL 2 SEAL 3

58

SEAL-1 SEAL-2 SEAL-3

[ 18 features, ~ 120/ 60 train/ test splits, ~ half relevant]

slide-59
SLIDE 59

Letor Results

[ Liu et al, SIGIR-LR4IR 2007]

0 45 0.5

Percep P Si id

41.8% 0.38%

  • 0.2%

0.35 0.4 0.45

Percep+Sigmoid RankSVM RankSVM+Sigmoid ListNet

21.5% 2.51% 2.02%

0 2 0.25 0.3

MAP

ListNet ListNet+Sigmoid

261% 7.86% 19.5%

0.1 0.15 0.2 0.05

Ohsumed Trec3 Trec4

59

[ # queries/ # features: (106/ 25) (50/ 44) (75/ 44)] Ohsumed Trec3 Trec4

slide-60
SLIDE 60

Learning Curve

0.92 0.90 n)

A few steps to convergence

good starting point

0.86 0.88 AUC (train perceptron+sigmoid

  • good starting point

0.84 perceptron+sigmoid rankSVM+sigmoid ListNet+sigmoid Random+sigmoid 0.82 5 10 15 20 25 30 35 Epoch (gradient descent iteration)

60

TOCCBCC Enron: user lokay-m

slide-61
SLIDE 61

Conclusions

  • Email acts
  • Managing/tracking commits, requests... (semi) automatically
  • Preventing User’s Mistakes
  • Preventing User s Mistakes
  • Email Leaks (accidentally adding non-intended recipients)
  • Recipient prediction (forgetting intended recipients)
  • Mozilla Thunderbird extension
  • Mozilla Thunderbird extension
  • Ranking in two-optimization steps

g p p

  • Robust to outliers (when compared to convex losses)
  • Closer approximation minimizing number of misranks

(empirical risk minimization framework) Fi t b l i f t d t ti i t

61

  • Fine-tune any base learner in few steps - good starting point
slide-62
SLIDE 62

Related Work 1

  • Email acts:

– Speech Act Theory [Austin, 1962;Searle,1969] f f – Email classification: spam, folder, etc. – Dialog Acts for Speech Recognition, Machine Translation, and other dialog-based systems. [Stolcke et , g y

[ al., 2000] [Levin et al., 03]

  • Typically, 1 act per utterance (or sentence) and more fine-grained taxonomies,

with larger number of acts.

  • Email is new domain

Email is new domain

– Winograd’s Coordinator (1987)

  • users manually annotated email with intent.

– Related applications: – Related applications:

  • Focus message in threads/discussions [Feng et al, 2006], Action-items

discovery [Bennett & Carbonell, 2005], Activity classification [ Dredze et al., 2006], Task-focused email summary [Corsten-Oliver et al, 2004], Predicting Social Roles [Leusky 2004] etc

62

Predicting Social Roles [Leusky, 2004], etc.

slide-63
SLIDE 63

Related Work 2

  • Email Leak

– [Boufaden et al., 2005]

  • proposed a privacy enforcement system to monitor specific

privacy breaches (student names, student grades, IDs).

  • Recipient Recommendation

– [Pal & McCallum, 2006], [Dredze et al., 2008] CC P di ti bl R i i t di ti b d

  • CC Prediction problem, Recipient prediction based on

summary keywords E t S h i E il – Expert Search in Email

  • [Dom et al.,2003], [Campbell et al,2003], [Balog & de Rijke,

2006], [Balog et al, 2006],[Soboroff, Craswell, de Vries (TREC- E t i 2005 06 07 )]

63

Enterprise 2005-06-07…)]

slide-64
SLIDE 64

Related Work 3

  • Ranking in two-optimization steps

– [Perez-Cruz et al, 2003]

  • similar idea for the SVM-classification context (Empirical Risk

Minimization)

– [Xu, Crammer & Schuurman, 2006][Krause & Singer, 2004][Zh & Sh 2005] t 2004][Zhan & Shen, 2005], etc.

  • SVM robust to outliers and label noise

– [Collobert et al, 2006], [Liu et al, 2005]

  • convexity tradeoff

64

slide-65
SLIDE 65

Thank you Thank you.

65