Collaborative Social Network Discovery from Online Communications Chris Diehl USMA-ARI Network Science Workshop Collaboration with Lise Getoor and Galileo Namata, University of Maryland – College Park
The Question � Organizations today utilize a number of communication channels � Email, Instant Messaging, Text Messaging, To: j.smith@enron.com Wikis, Blogs From: j.doe@enron.com Subject: Re: trade � Given access to an organization’s My friend John says …. online communications, how does one infer relationship and role types within the organization from the data? 2
Data Attributes � Structured Data (Metadata) � Sender and recipient(s), datetime � Can identify patterns of communication from metadata � Metadata provides no relationship context � Unstructured Data (Content) � Message subject and body, attachments � Content may provide relationship and role information � Additional context may be needed to clarify the message � Goal is to exploit complimentary cues offered by the metadata and content 3
Identifying Key Actors – A Motivating Example From: Jennifer Fraser Subject: john arnold bid for 20,000? true? and when do you plan on selling them? From: John Arnold exaggerations...word travels everywhere doesnt it? how'd you hear? From: Jennifer Fraser johnny johhny johnny-- there is no secrecy when one is the king of ng .. your brokers have the biggest moves in the world… 4
Representations: Data and Network Communication Network (Hyper)Graph (Hyper)Graph HP Labs Communication Graph (Adamic and Adar, 2003) Nodes: Network References Nodes: Entities Edges: Communication Events Edges: Social Relationships 5
Collaborative Social Network Discovery Communication Graph Incremental Machine Learning from Context Entity Resolution Relationship Identification Validated Network 6
Entity Resolution: InfoVis Co-Author Network Fragment Before After 7
D-Dupe: An Interactive Tool for Entity Resolution http://www.cs.umd.edu/projects/linqs/ddupe 8
Entity Resolution: Name and Network References � Every individual Datetime: 2001-01-23 09:45:00 has two classes of references Sender: sara.shackleton@enron.com Network References � Recipients: tana.jones@enron.com To define an individual’s identity Subject: Hedge Funds and draw broader connections across Name Tana: Other than your email emails, we need to attached, have you had other References first associate discussions with Mark or credit name and network about hedge funds? Sara references Reference: C. P. Diehl, L. Getoor, G. Namata, "Name Reference Resolution in Organizational Email Archives," SIAM Data Mining 2006 9
Context Challenges Datetime: 2000-06-19 Datetime: 2001-02-28 09:32:00 09:52:00 Sender: Sender: tana.jones@enron.com liz.taylor@enron.com Recipients: Recipients: marie.heard@enron.com john.arnold@enron.com Subject: Just a tease!!! Subject: Greg s Bill Wouldn t you like to know Johnny, What does Greg owe which of the two Susan s you for the champagne? Is it gave her notice today $896.00? Liz 10
Relationship Identification - Incremental Ego Network Exploration Evidence Discovery From: Christian Yoder [christian.yoder@enron.com] To: Elizabeth Sager [elizabeth.sager@enron.com], Genia Fitzgerald [genia.fitzgerald@enron.com] Relationship Ranking Message Ranking Subject: Happiness Happiness is looking at the new legal Rank Relationship with Ego Rank Message Subject org chart (which Jan just now (Christian Yoder) 1 Happiness dropped on my desk). I always 1 Elizabeth Sager approach these dry documents as 2 Richard Sanders 2 System Outage Risk though they were trigrams resulting from throwing the coins and 3 Steve Hall consulting the I-Ching. At the top of 3 Mark Taylor Visit 4 Mark Haedicke the trigram which I find myself listed in I see a single name: Elizabeth 4 Question about a deal 5 Dave Fuller we did Sager, and at the bottom I see the 6 Tracy Ngo name Genia FitzGerald. ... cgy Reference: C. P. Diehl, G. Namata, L. Getoor, ”Relationship Identification for Social Network Discovery," AAAI 2007 11
Enron Manager-Subordinate Communications Relationships barbara.gray@enron.com gerald.nemec@enron.com mark.guzman@enron.com n..gray@enron.com bill.iii@enron.com carol.clair@enron.com leslie.hansen@enron.com chris.gaskill@enron.com janet.moore@enron.com harlan.murphy@enron.com jeffrey.hodge@enron.com christian.yoder@enron.com s..shively@enron.com cheryl.nelson@enron.com brent.hendry@enron.com elizabeth.sager@enron.com david.portz@enron.com sara.shackleton@enron.com vince.kaminski@enron.com susan.bailey@enron.com mark.haedicke@enron.com marie.heard@enron.com kay.mann@enron.com mark.taylor@enron.com alice.wright@enron.com .taylor@enron.com gwyn.koepke@enron.com tana.jones@enron.com sean.crandall@enron.com sheila.tweed@enron.com e..haedicke@enron.com mary.cook@enron.com stephanie.panus@enron.com mike.swerzbin@enron.com robert.badeer@enron.com mark.greenberg@enron.com pinto.leite@enron.com tim.belden@enron.com diana.scholtes@enron.com jean.mrha@enron.com f..calger@enron.com bert.meyers@enron.com louise.kitchen@enron.com tyrell.harrison@enron.com hunter.shively@enron.com bill.williams@enron.com m..presto@enron.com mara.bronstein@enron.com k..allen@enron.com rogers.herndon@enron.com mark.whitt@enron.com ryan.slinger@enron.com mike.grigsby@enron.com barry.tycholiz@enron.com phillip.allen@enron.com lloyd.will@enron.com john.lavorato@enron.com stephanie.miller@enron.com jane.tholt@enron.com t..lucci@enron.com scott.neal@enron.com phil.polsky@enron.com matthew.lenhart@enron.com john.arnold@enron.com dave.fuller@enron.com kimberly.bates@enron.com l @ 12
Relationship Identification - Manager-Subordinate Relations � Preference Learning Approach Mean � Supervised learning of relationship Reciprocal ranker Rank � Given initial set of labeled ego networks Content- Based with � Ranking dyadic relationships 0.719 Attribute � Traffic-Based Approach Selection � Message frequency Content- 0.660 � Number of recipients Based � Exchanges between relationship Traffic-Based 0.518 participants and common recipients � Content-Based Approach Random 0.211 Selection � Term frequency vector for set of messages corresponding to the Worst Case 0.141 relationship � Exploits text from sender to recipient 13
Future Directions � Incremental, Active Learning � Relationship-Level and Message-Level Annotations � Automated Model Selection � Automated Feature Selection � Visualization � Communications Graph Exploration � Network Graph Construction � Interaction Paradigms � Unified Workflow for Entity Resolution and Relationship Identification 14
Recommend
More recommend