collaborative social network discovery from online
play

Collaborative Social Network Discovery from Online Communications - PowerPoint PPT Presentation

Collaborative Social Network Discovery from Online Communications Chris Diehl USMA-ARI Network Science Workshop Collaboration with Lise Getoor and Galileo Namata, University of Maryland College Park The Question Organizations today


  1. Collaborative Social Network Discovery from Online Communications Chris Diehl USMA-ARI Network Science Workshop Collaboration with Lise Getoor and Galileo Namata, University of Maryland – College Park

  2. The Question � Organizations today utilize a number of communication channels � Email, Instant Messaging, Text Messaging, To: j.smith@enron.com Wikis, Blogs From: j.doe@enron.com Subject: Re: trade � Given access to an organization’s My friend John says …. online communications, how does one infer relationship and role types within the organization from the data? 2

  3. Data Attributes � Structured Data (Metadata) � Sender and recipient(s), datetime � Can identify patterns of communication from metadata � Metadata provides no relationship context � Unstructured Data (Content) � Message subject and body, attachments � Content may provide relationship and role information � Additional context may be needed to clarify the message � Goal is to exploit complimentary cues offered by the metadata and content 3

  4. Identifying Key Actors – A Motivating Example From: Jennifer Fraser Subject: john arnold bid for 20,000? true? and when do you plan on selling them? From: John Arnold exaggerations...word travels everywhere doesnt it? how'd you hear? From: Jennifer Fraser johnny johhny johnny-- there is no secrecy when one is the king of ng .. your brokers have the biggest moves in the world… 4

  5. Representations: Data and Network Communication Network (Hyper)Graph (Hyper)Graph HP Labs Communication Graph (Adamic and Adar, 2003) Nodes: Network References Nodes: Entities Edges: Communication Events Edges: Social Relationships 5

  6. Collaborative Social Network Discovery Communication Graph Incremental Machine Learning from Context Entity Resolution Relationship Identification Validated Network 6

  7. Entity Resolution: InfoVis Co-Author Network Fragment Before After 7

  8. D-Dupe: An Interactive Tool for Entity Resolution http://www.cs.umd.edu/projects/linqs/ddupe 8

  9. Entity Resolution: Name and Network References � Every individual Datetime: 2001-01-23 09:45:00 has two classes of references Sender: sara.shackleton@enron.com Network References � Recipients: tana.jones@enron.com To define an individual’s identity Subject: Hedge Funds and draw broader connections across Name Tana: Other than your email emails, we need to attached, have you had other References first associate discussions with Mark or credit name and network about hedge funds? Sara references Reference: C. P. Diehl, L. Getoor, G. Namata, "Name Reference Resolution in Organizational Email Archives," SIAM Data Mining 2006 9

  10. Context Challenges Datetime: 2000-06-19 Datetime: 2001-02-28 09:32:00 09:52:00 Sender: Sender: tana.jones@enron.com liz.taylor@enron.com Recipients: Recipients: marie.heard@enron.com john.arnold@enron.com Subject: Just a tease!!! Subject: Greg s Bill Wouldn t you like to know Johnny, What does Greg owe which of the two Susan s you for the champagne? Is it gave her notice today $896.00? Liz 10

  11. Relationship Identification - Incremental Ego Network Exploration Evidence Discovery From: Christian Yoder [christian.yoder@enron.com] To: Elizabeth Sager [elizabeth.sager@enron.com], Genia Fitzgerald [genia.fitzgerald@enron.com] Relationship Ranking Message Ranking Subject: Happiness Happiness is looking at the new legal Rank Relationship with Ego Rank Message Subject org chart (which Jan just now (Christian Yoder) 1 Happiness dropped on my desk). I always 1 Elizabeth Sager approach these dry documents as 2 Richard Sanders 2 System Outage Risk though they were trigrams resulting from throwing the coins and 3 Steve Hall consulting the I-Ching. At the top of 3 Mark Taylor Visit 4 Mark Haedicke the trigram which I find myself listed in I see a single name: Elizabeth 4 Question about a deal 5 Dave Fuller we did Sager, and at the bottom I see the 6 Tracy Ngo name Genia FitzGerald. ... cgy Reference: C. P. Diehl, G. Namata, L. Getoor, ”Relationship Identification for Social Network Discovery," AAAI 2007 11

  12. Enron Manager-Subordinate Communications Relationships barbara.gray@enron.com gerald.nemec@enron.com mark.guzman@enron.com n..gray@enron.com bill.iii@enron.com carol.clair@enron.com leslie.hansen@enron.com chris.gaskill@enron.com janet.moore@enron.com harlan.murphy@enron.com jeffrey.hodge@enron.com christian.yoder@enron.com s..shively@enron.com cheryl.nelson@enron.com brent.hendry@enron.com elizabeth.sager@enron.com david.portz@enron.com sara.shackleton@enron.com vince.kaminski@enron.com susan.bailey@enron.com mark.haedicke@enron.com marie.heard@enron.com kay.mann@enron.com mark.taylor@enron.com alice.wright@enron.com .taylor@enron.com gwyn.koepke@enron.com tana.jones@enron.com sean.crandall@enron.com sheila.tweed@enron.com e..haedicke@enron.com mary.cook@enron.com stephanie.panus@enron.com mike.swerzbin@enron.com robert.badeer@enron.com mark.greenberg@enron.com pinto.leite@enron.com tim.belden@enron.com diana.scholtes@enron.com jean.mrha@enron.com f..calger@enron.com bert.meyers@enron.com louise.kitchen@enron.com tyrell.harrison@enron.com hunter.shively@enron.com bill.williams@enron.com m..presto@enron.com mara.bronstein@enron.com k..allen@enron.com rogers.herndon@enron.com mark.whitt@enron.com ryan.slinger@enron.com mike.grigsby@enron.com barry.tycholiz@enron.com phillip.allen@enron.com lloyd.will@enron.com john.lavorato@enron.com stephanie.miller@enron.com jane.tholt@enron.com t..lucci@enron.com scott.neal@enron.com phil.polsky@enron.com matthew.lenhart@enron.com john.arnold@enron.com dave.fuller@enron.com kimberly.bates@enron.com l @ 12

  13. Relationship Identification - Manager-Subordinate Relations � Preference Learning Approach Mean � Supervised learning of relationship Reciprocal ranker Rank � Given initial set of labeled ego networks Content- Based with � Ranking dyadic relationships 0.719 Attribute � Traffic-Based Approach Selection � Message frequency Content- 0.660 � Number of recipients Based � Exchanges between relationship Traffic-Based 0.518 participants and common recipients � Content-Based Approach Random 0.211 Selection � Term frequency vector for set of messages corresponding to the Worst Case 0.141 relationship � Exploits text from sender to recipient 13

  14. Future Directions � Incremental, Active Learning � Relationship-Level and Message-Level Annotations � Automated Model Selection � Automated Feature Selection � Visualization � Communications Graph Exploration � Network Graph Construction � Interaction Paradigms � Unified Workflow for Entity Resolution and Relationship Identification 14

Recommend


More recommend