summarizing diverging string sequences with applications
play

Summarizing Diverging String Sequences, with Applications to - PowerPoint PPT Presentation

Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions Patty Commins 1,2 , David Liben-Nowell 1 , Tina Liu 1,3 , and Kiran Tomlinson 1,4 1 Department of Computer Science, Carleton College 2 Department of Mathematics,


  1. Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions Patty Commins 1,2 , David Liben-Nowell 1 , Tina Liu 1,3 , and Kiran Tomlinson 1,4 1 Department of Computer Science, Carleton College 2 Department of Mathematics, University of Minnesota 3 Surescripts 4 Department of Computer Science, Cornell University CPM 2020 1 / 26

  2. Chain-Letter Petitions ∼ 3 . 5m emails ∼ 170k signers (Chierichetti, Kleinberg, & Liben-Nowell 2011) Sent 20 February 2003, retrieved from G.W.B. Presidential Library 2 / 26

  3. Chain-Letter Petitions 3 / 26

  4. Chain-Letter Petitions Alice 3 / 26

  5. Chain-Letter Petitions Alice 3 / 26

  6. Chain-Letter Petitions Alice Bob 3 / 26

  7. Chain-Letter Petitions Alice Bob 3 / 26

  8. Chain-Letter Petitions Alice Bob Carl Dan 3 / 26

  9. Chain-Letter Petitions Alice Alice − → Bob Bob Carl Dan Carl 3 / 26

  10. Chain-Letter Petitions Alice Alice Alice − → Bob Bob Bob Carl Dan Carl Dan 3 / 26

  11. Chain-Letter Petitions Alice Alice Bob Bob Carl Dan 3 / 26

  12. Reconstruction Central Question Can we reconstruct the propagation tree from signature lists? Alice Alice ? Bob Bob ← − Carl Dan 4 / 26

  13. Reconstruction Central Question Can we reconstruct the propagation tree from signature lists? Alice Alice Alice Bob Bob Bob ← − Carl Dan Carl Dan 4 / 26

  14. Challenge: Mutations People are bad at copy-paste. 5 / 26

  15. Challenge: Mutations People are bad at copy-paste. 1 Substitution Alice Alice Bob − → Eve Carl Carl 5 / 26

  16. Challenge: Mutations People are bad at copy-paste. 1 Substitution Alice Alice Bob − → Eve Carl Carl 2 Insertion Alice Alice Bob Bob − → Carl Eve Carl 5 / 26

  17. Challenge: Mutations People are bad at copy-paste. 1 Substitution Alice Alice Bob − → Eve Carl Carl 2 Insertion Alice Alice Bob Bob − → Carl Eve Carl 3 Deletion Alice Alice Bob Carl − → Carl 5 / 26

  18. Challenge: Mutations People are bad at copy-paste. 1 Substitution Alice Alice Bob − → Eve Carl Carl 2 Insertion Character-level: Carl → Car o l, Al i ce → Al y ce Alice Alice Bob Bob − → Carl Eve Carl 3 Deletion Alice Alice Bob Carl − → Carl 5 / 26

  19. Challenge: Mutations People are bad at copy-paste. 1 Substitution Alice Alice Bob − → Eve Carl Carl 2 Insertion Character-level: Carl → Car o l, Al i ce → Al y ce Alice Alice Bob Bob − → Carl Eve All present in the Iraq War petition (Liben-Nowell & Kleinberg 2008) Carl 3 Deletion Alice Alice Bob Carl − → Carl 5 / 26

  20. Reconstruction with Mutations Alice Alice Alice Carl Bob Bob ? − → Eve Carol Carl Dan Frank 6 / 26

  21. Reconstruction with Mutations Alice Alice Alice Carl Bob Bob ? − → Eve Carol Carl Dan Frank Key chain letter features 6 / 26

  22. Reconstruction with Mutations Alice Alice Alice Carl Bob Bob ? − → Eve Carol Carl Dan Frank Key chain letter features 1 One-ended growth 6 / 26

  23. Reconstruction with Mutations Alice Alice Alice Carl Bob Bob ? − → Eve Carol Carl Dan Frank Key chain letter features 1 One-ended growth 2 Divergence 6 / 26

  24. Reconstruction with Mutations Alice Alice Alice Carl Bob Bob ? − → Eve Carol Carl Dan Frank Key chain letter features 1 One-ended growth 2 Divergence 3 Mutation with inheritance 6 / 26

  25. Summary of Contributions 1 Formal definition of chain letter reconstruction problem 7 / 26

  26. Summary of Contributions 1 Formal definition of chain letter reconstruction problem 2 NP-hardness proof 7 / 26

  27. Summary of Contributions 1 Formal definition of chain letter reconstruction problem 2 NP-hardness proof 3 Efficient optimal solution for two lists 7 / 26

  28. Summary of Contributions 1 Formal definition of chain letter reconstruction problem 2 NP-hardness proof 3 Efficient optimal solution for two lists 4 Fixed-parameter tractable: poly-time algorithm for O (1) lists 7 / 26

  29. Summary of Contributions 1 Formal definition of chain letter reconstruction problem 2 NP-hardness proof 3 Efficient optimal solution for two lists 4 Fixed-parameter tractable: poly-time algorithm for O (1) lists 5 Fast heuristic for arbitrary number of lists 7 / 26

  30. Summary of Contributions 1 Formal definition of chain letter reconstruction problem 2 NP-hardness proof 3 Efficient optimal solution for two lists 4 Fixed-parameter tractable: poly-time algorithm for O (1) lists 5 Fast heuristic for arbitrary number of lists 6 Experimental evaluation on synthetic data 7 / 26

  31. Summary of Contributions 1 Formal definition of chain letter reconstruction problem 2 NP-hardness proof ∗ 3 Efficient optimal solution for two lists 4 Fixed-parameter tractable: poly-time algorithm for O (1) lists ∗ 5 Fast heuristic for arbitrary number of lists 6 Experimental evaluation on synthetic data ∗ see paper 7 / 26

  32. Related Work Chain letters Iraq war petition tree structure (Liben-Nowell & Kleinberg 2008; Golub & Jackson 2010; Chierichetti, Liben-Nowell, & Kleinberg 2011) Tree reconstruction from plea (Bennett, Li, & Ma 2003) 8 / 26

  33. Related Work Chain letters Iraq war petition tree structure (Liben-Nowell & Kleinberg 2008; Golub & Jackson 2010; Chierichetti, Liben-Nowell, & Kleinberg 2011) Tree reconstruction from plea (Bennett, Li, & Ma 2003) One-ended growth and divergence Trie (De La Briandais 1959; Fredkin 1960) Online conversations (Kumar, Mahdian, & McGlohon 2010) 8 / 26

  34. Related Work Chain letters Iraq war petition tree structure (Liben-Nowell & Kleinberg 2008; Golub & Jackson 2010; Chierichetti, Liben-Nowell, & Kleinberg 2011) Tree reconstruction from plea (Bennett, Li, & Ma 2003) One-ended growth and divergence Trie (De La Briandais 1959; Fredkin 1960) Online conversations (Kumar, Mahdian, & McGlohon 2010) Divergence and mutation Molecular phylogenetics (Yang & Rannala 2012) Stories; e.g., Little Red Riding Hood (Tehrani 2013) 8 / 26

  35. Outline Introduction 1 Problem Definition 2 Reconstruction Algorithm 3 Results 4 Conclusion 5 9 / 26

  36. Problem Definition, Informally DSSSP (Diverging String Sequence Summarization Problem) Given diverging string sequences: Alice Alice Alice Carl Bob Bob Eve Carol Carl Dan Frank 10 / 26

  37. Problem Definition, Informally DSSSP (Diverging String Sequence Summarization Problem) Given diverging string sequences : Alice Alice Alice Bob Carl Bob Eve Carol Carl Dan Frank 10 / 26

  38. Problem Definition, Informally DSSSP (Diverging String Sequence Summarization Problem) Given diverging string sequences: Find best summary tree: Alice Alice Alice Alice Carl Bob Bob Eve Carol Carl Bob Dan Frank Carl Eve Dan Frank 10 / 26

  39. Problem Definition, Informally DSSSP (Diverging String Sequence Summarization Problem) Given diverging string sequences: Find best summary tree: Alice Alice Alice Alice Carl Bob Bob Eve Carol Carl Bob Dan Frank Carl Eve Dan Frank 10 / 26

  40. Competing Objectives Alice Alice Alice Carl Bob Bob Eve Carol Carl Dan Frank 11 / 26

  41. Competing Objectives Alice Alice Alice Carl Bob Bob Eve Carol Carl Dan Frank Accurate representation Alice Carl Bob Eve Carol Carl Dan Frank 11 / 26

  42. Competing Objectives Alice Alice Alice Carl Bob Bob Carol Carl Eve Dan Frank Accurate representation Alice Carl Bob Eve Carol Carl Dan Frank 11 / 26

  43. Competing Objectives Alice Alice Alice Carl Bob Bob Eve Carl Carol Dan Frank Accurate representation Alice Carl Bob Eve Carol Carl Frank Dan 11 / 26

  44. Competing Objectives Alice Alice Alice Carl Bob Bob Eve Carol Carl Dan Frank Accurate representation Alice Carl Bob Eve Carol Carl Dan Frank 11 / 26

  45. Competing Objectives Alice Alice Alice Carl Bob Bob Eve Carol Carl Dan Frank Accurate representation Minimal redundancy Alice Alice Carl Bob Bob Eve Carol Carl Carl Dan Frank Eve Dan Frank 11 / 26

  46. Competing Objectives Alice Alice Alice Carl Bob Bob Carol Carl Eve Dan Frank Accurate representation Minimal redundancy Alice Alice Carl Bob Bob Eve Carol Carl Carl Dan Frank Dan Frank Eve 11 / 26

  47. Competing Objectives Alice Alice Alice Carl Bob Bob Eve Carl Carol Dan Frank Accurate representation Minimal redundancy Alice Alice Carl Bob Bob Eve Carol Carl Carl Dan Frank Eve Frank Dan 11 / 26

  48. Competing Objectives Alice Alice Alice Carl Bob Bob Eve Carol Carl Dan Frank Accurate representation Minimal redundancy Alice Alice Carl Bob Bob Eve Carol Carl Carl Dan Frank Eve Dan Frank 11 / 26

  49. Measuring Representation Accuracy Alice x 1 x 2 x 3 Bob Alice Alice Alice Carl Bob Bob Carl Eve Carol Carl Dan Frank Eve Dan Frank x 1 x 2 x 3 12 / 26

  50. Measuring Representation Accuracy Alice x 1 x 2 x 3 Bob Alice Alice Alice Carl Bob Bob Carl Eve Carol Carl Dan Frank Eve Dan Frank x 1 x 2 x 3 labelseq T ( x 1 ) labelseq T ( x 2 ) labelseq T ( x 3 ) Alice Alice Alice Bob Bob Bob Carl Carl Carl Eve Dan Frank 12 / 26

Recommend


More recommend