dyn ynamic mic pr processes esses ove ver in informat
play

Dyn ynamic mic Pr Processes esses ove ver In Informat matio - PowerPoint PPT Presentation

Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence Le Song College of Computing Georgia


  1. idea adoption/disease spread/viral marketing D S means 1:00pm S follows D David vid 1:10pm Sophie hie 1:18pm 1:15pm Christine ine Bob 1:25pm Jacob ob 33

  2. Scenario I: idea adoption D S means 1:00pm D D is source S follows D ๐‘‚ ๐ธ ๐‘ข = 1 ๐‘ข David vid 1:10pm Sophie hie Terminating process adopt product only once B C ๐‘‚ ๐ถ ๐‘ข ๐‘‚ ๐ท ๐‘ข Followee Not yet adopted adopted 1:18pm 1:15pm Christine ine Bob ๐ต ๐พ๐ท ๐ต ๐พ๐ถ โ„Ž ๐พโˆ— ๐‘ข = ๐ต ๐พ๐ถ ๐‘ข 1 โˆ’ ๐‘‚ ๐พ ๐‘ข ๐‘‚ ๐ถ ๐‘ข 1:25pm + ๐ต ๐พ๐ท ๐‘ข 1 โˆ’ ๐‘‚ ๐พ ๐‘ข ๐‘‚ ๐ท ๐‘ข J Jacob ob ๐‘‚ ๐พ ๐‘ข 34

  3. Cascades from D in 30 mins 1:00pm 1:08pm David vid Sophie hie 1:17pm Bob 1:25pm Jacob ob 1:48pm Christine ine 35

  4. Cascades from D in 30 mins 1:00pm 2:00pm 1:03pm David vid David vid 2:08pm Sophie hie 1:17pm Sophie hie Bob 1:25pm Jacob ob 2:42pm 2:37pm 1:48pm Christine ine Bob Christine ine 2:53pm Jacob ob 36

  5. Cascades from D in 30 mins 1:00pm 2:00pm 7:00pm 1:03pm David vid David vid David vid 7:06pm 2:08pm Sophie hie Sophie hie 1:17pm Sophie hie Bob 7:17pm 1:25pm Christine ine Jacob ob 7:32pm 2:42pm 2:47pm Bob 1:48pm Christine ine Bob 7:50pm Christine ine 2:53pm Jacob ob 37 Jacob ob

  6. Cascades from D in 30 mins 1:00pm 2:00pm 7:00pm 1:03pm David vid David vid David vid 7:06pm 2:08pm Sophie hie Sophie hie 1:17pm Sophie hie Bob 7:17pm 1:25pm Christine ine Jacob ob 7:32pm 2:42pm 2:47pm Bob 1:48pm Christine ine Bob 7:50pm Christine ine 2:53pm Jacob ob 38 Jacob ob

  7. Cascade Data Cascade: a sequence of (node, time) pairs for a particular piece of news Cascades can start from different sources Cascade 2 Cascade 1 Cascade 3 User 1 ๐‘ข User 2 ๐‘ข User 3 ๐‘ข โ€ฆ User n: ๐‘ข ๐‘ข 1 , ๐‘ข 2 , ๐‘ข 3 , โ€ฆ , ๐‘ข ๐‘œ (๐‘ข 1 , ๐‘ข 2 , ๐‘ข 3 , โ€ฆ , ๐‘ข ๐‘œ ) ๐‘ข 1 , ๐‘ข 2 , ๐‘ข 3 , โ€ฆ , ๐‘ข ๐‘œ 39

  8. Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence Modeling: Coevolution 40

  9. Information diffusion and network coevolution 1pm, D: D S 1:45pm Cool paper means S follows D David vid 2pm, D: Nice idea 1:10pm, @D: Indeed Tina Sophie hie 1:18pm, @S @D: 1:15pm, @S @D: Very useful Classic Bob Christine ine Olivi via Jacob ob 2:03pm, @D: Agree 1:35pm @B @S @D: Indeed brilliant Farajtabar et al. NIPS 2015 41

  10. Information diffusion and network coevolution 1pm, D: Cool paper (D, D, 1:00) 1:45pm (J, D, 1:45) David vid Link creation event sequence Sophie hie (J, J) ๐‘ข 5:25pm (J, S, 5:25) (J, D) 4pm, B: (B, B, 4:00) (J, S) It snows โ€ฆ Christine ine Bob Tweet/retweet event sequence (J, J) ๐‘ข 1:35pm @B @S @D: (J, D, 1:35) Indeed brilliant (J, D) Jacob ob 4:10pm, @B: (J, B, 4:10) Beautiful (J, B) 5pm, J: โ€ฆ (J, J, 5:00) Going out 42

  11. Targeted retweet (D, D) Dโ€™s own initiative โ„Ž ๐ธโˆ— ๐‘ข = ๐œƒ David vid Sophie hie (C, D) (B, D) ๐‘‚ ๐ท๐ธ ๐‘ข ๐‘‚ ๐ถ๐ธ ๐‘ข Mutually-exciting process Christine ine Bob High if followees retweet frequently ๐ต ๐พ๐ท ๐‘ข ๐ต ๐พ๐ถ ๐‘ข โ„Ž ๐พ๐ธโˆ— ๐‘ข = ๐›พ ๐ธ ๐ต ๐พ๐ถ ๐‘ข exp โˆ’ ๐‘ข โ‹† ๐‘’๐‘‚ ๐ถ๐ธ ๐‘ข (J, D) + ๐›พ ๐ธ ๐ต ๐พ๐ท ๐‘ข exp โˆ’ ๐‘ข โ‹† ๐‘’๐‘‚ ๐ท๐ธ ๐‘ข Jacob ob ๐‘‚ ๐พ๐ธ ๐‘ข 43

  12. Information driven link creation (J, D) 1:45pm ๐ต ๐พ๐ธ (๐‘ข) David vid ๐พ โ€™s random exploration ๐›ฟ ๐พ๐ธโˆ— ๐‘ข = 1 โˆ’ ๐ต ๐พ๐ธ ๐‘ข โ‹… ๐œˆ ๐พ + ๐›ฝ ๐ธ exp โˆ’ ๐‘ข โ‹† ๐‘’๐‘‚ ๐พ๐ธ ๐‘ข Sophie hie Check whether Retweet ๐ธ the link already there Self-exciting process Christine ine Bob Terminating process no link and retweet often (J, D) Jacob ob ๐‘‚ ๐พ๐ธ ๐‘ข 44

  13. Joint model of retweet + link creation Diffusion network Diffusion network ๐‘ฉ ๐‘ข โˆˆ {0,1} ๐‘ฉ ๐‘ข โˆˆ {0,1} Alter Support Link creation Link creation Information diffusion Information diffusion process ๐‘ถ ๐‘ข โˆˆ 0 โˆช ๐‘Ž + process ๐‘ถ ๐‘ข โˆˆ 0 โˆช ๐‘Ž + process process Terminating Mutually-exciting Drive process process 45

  14. Simulation 46

  15. Link creation parameter controls network type ๐›ฝ ๐ธ = 0 ๐›ฝ ๐ธ large Erdos-Renyi random networks Scale-free networks 47

  16. Shrinking network diameters Generate networks with small shrinking diameter Small connected components merge Diameter shrinks 48

  17. Cascade patterns: structure Generate short and fat cascades as ๐›ฝ increases ๐›พ = 0.2 49

  18. Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence Modeling: Collaborative Dynamics 50

  19. Collaborative dynamics ๐‘ 1 ๐‘ 2 Low rank ๐‘ 3 ๐‘ 4 ๐œˆ ๐ธ๐‘ 1 ๐œˆ ๐ธ๐‘ 4 โ€ฆ David id โ‹ฎ โ‹ฑ โ‹ฎ ๐œˆ ๐พ๐‘ 1 ๐œˆ ๐พ๐‘ 4 โ€ฆ ๐›ฝ ๐ธ๐‘ 1 ๐›ฝ ๐ธ๐‘ 4 Sophi phie โ€ฆ โ‹ฎ โ‹ฑ โ‹ฎ ๐›ฝ ๐พ๐‘ 1 ๐›ฝ ๐พ๐‘ 4 โ€ฆ Chris istine tine Self-exciting process Tend to go to the same store Jacob ob again and again โ„Ž ๐ธ๐‘ 1 โˆ— ๐‘ข = ๐œˆ ๐ธ๐‘ 1 + ๐›ฝ ๐ธ๐‘ 1 ๐ธ๐‘ 1 | exp โˆ’|๐‘ข โˆ’ ๐‘ข ๐‘— ๐ธ๐‘1 โˆˆ๐“˜ ๐‘ข ๐ธ๐‘1 ๐‘ข ๐‘— 51

  20. Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence Learning: Sparse Networks 52

  21. Hidden diffusion networks Estimate the Estimate the diffusion diffusion networks? networks? 53

  22. Parametrization of idea adoption model D S D is source means 1:00pm D ๐‘‚ ๐ธ ๐‘ข = 1 Parametrization S follows D ๐‘ข ๐ต ๐พ๐ธ David vid 1:10pm ๐ต ๐พ๐‘‡ ๐‘ฅ = ๐ต ๐พ๐ถ Sophie hie ๐ต ๐พ๐ท B C ๐‘‚ ๐ถ ๐‘ข ๐‘‚ ๐ท ๐‘ข Terminating process adopt product only once 1:18pm 1:15pm Christine ine Bob Followee Not yet ๐ต ๐พ๐ท ๐ต ๐พ๐ถ adopted adopted 1:25pm J โ„Ž ๐พโˆ— ๐‘ข = ๐ต ๐พ๐ถ 1 โˆ’ ๐‘‚ ๐พ ๐‘ข ๐‘‚ ๐ถ ๐‘ข Jacob ob ๐‘‚ ๐พ ๐‘ข + ๐ต ๐พ๐ท 1 โˆ’ ๐‘‚ ๐พ ๐‘ข ๐‘‚ ๐ท ๐‘ข 54

  23. โ„“ 1 Regularized log-likelihood ๐‘› ๐‘€ ๐‘ฅ + ๐œ‡ ๐‘ฅ 1 = log ๐‘ฅ, ๐œš โˆ— ๐‘ข ๐‘— โˆ’ ๐‘ฅ, ฮจ โˆ— ๐‘ˆ โˆ’ ๐œ‡ ๐‘ฅ 1 ๐‘—=1 ๐‘ˆ time ๐‘ข 1 ๐‘ข 3 ๐‘ข 2 ๐‘ข Jacob ob Likelihood: โ„Ž โˆ— ๐‘ข 1 โ„Ž โˆ— ๐‘ข 2 โ„Ž โˆ— ๐‘ข 3 โ„Ž โˆ— ๐‘ข exp โˆ’ โ„Ž โˆ— ๐œ ๐‘’๐œ ๐‘ˆ 0 ๐‘ฅ, ๐œš โˆ— ๐‘ข 1 ๐‘ฅ, ๐œš โˆ— ๐‘ข 3 ๐‘ˆ ๐‘ฅ, ๐œš โˆ— (๐œ) ๐‘’๐œ exp โˆ’ 0 ๐‘ฅ, ๐œš โˆ— ๐‘ข 2 ๐‘ฅ, ๐œš โˆ— ๐‘ข 55

  24. Soft-thresholding algorithm โ„“ 1 -reguarlized likelihood estimation problem. Solve one such problem for each node. Set learning rate ๐›พ ๐‘™ = 0 Initialize ๐‘ฅ While ๐‘™ โ‰ค ๐ฟ , do ๐‘ฅ ๐‘™+1 = ๐‘ฅ ๐‘™ โˆ’ ๐›พ โ‹… ๐›ผ ๐‘ฅ ๐‘€ ๐‘ฅ ๐‘™ โˆ’ ๐œ‡ โ‹… ๐›พ + ๐‘˜ ๐‘˜ ๐‘™ = ๐‘™ + 1 End while ๐‘ฃ ๐‘ฃ ๐‘— ๐‘— ๐‘ค ๐‘ค 56

  25. Statistical guarantees Recovery conditions: 2 ๐‘€, is bounded [๐ท ๐‘›๐‘—๐‘œ , ๐ท ๐‘›๐‘๐‘ฆ ] Eigenvalue of the Hessian, ๐‘… = ๐›ผ ๐‘ฅ Gradient is upper bounded, ๐›ผ ๐‘ฅ ๐‘€ โˆž โ‰ค ๐ท 1 Hazard is lower bounded, min ๐‘ฅ ๐‘˜ โ‰ฅ ๐ท 2 Incoherence condition: ๐‘… ๐‘‡ ๐‘‘ ๐‘‡ ๐‘… ๐‘‡๐‘‡ โˆ’1 โˆž โ‰ค 1 โˆ’ ๐œ network structure parameter value observation window source node distribution Given ๐‘œ > ๐ท 3 โ‹… ๐‘’ 3 log ๐‘ž cascades, set regularization parameter 2โˆ’๐œ log ๐‘ž ๐œ‡ โ‰ฅ ๐ท 4 โ‹… ๐‘œ , the network structure can be recovered with ๐œ probability at least 1 โˆ’ 2 exp(โˆ’๐ท โ€ฒโ€ฒ ๐œ‡ 2 ๐‘œ) 57

  26. Memetracker 58

  27. Estimated diffusion network Blogs Mainstream media Nan et al. NIPS 2012 59

  28. Tracking diffusion networks 60

  29. Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence Learning: Low Rank Collaborative Dynamics 61

  30. Collaborative dynamics ๐‘ 1 ๐‘ 2 Regularization ๐‘ 3 ๐‘ 4 ๐œˆ ๐ธ๐‘ 1 ๐œˆ ๐ธ๐‘ 4 โ€ฆ โ‹ฎ โ‹ฑ โ‹ฎ David id ๐œˆ ๐พ๐‘ 1 ๐œˆ ๐พ๐‘ 4 โ€ฆ โˆ— ๐›ฝ ๐ธ๐‘ 1 ๐›ฝ ๐ธ๐‘ 4 Sophi phie โ€ฆ โ‹ฎ โ‹ฑ โ‹ฎ ๐›ฝ ๐พ๐‘ 1 ๐›ฝ ๐พ๐‘ 4 โ€ฆ โˆ— Chris istine tine Self-exciting process Tend to go to the same store Jacob ob again and again โ„Ž ๐ธ๐‘ 1 โˆ— ๐‘ข = ๐œˆ ๐ธ๐‘ 1 + ๐›ฝ ๐ธ๐‘ 1 ๐ธ๐‘ 1 | exp โˆ’|๐‘ข โˆ’ ๐‘ข ๐‘— ๐ธ๐‘1 โˆˆ๐“˜ ๐‘ข ๐ธ๐‘1 ๐‘ข ๐‘— 62

  31. Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence Learning: Generic Algorithm 63

  32. Concave log-likelihood of event sequence Log-likelihood Concave in Concave in ๐‘› ๐‘€(๐‘ฅ) = log ๐‘ฅ, ๐œš โˆ— ๐‘ข ๐‘— w! w! โˆ’ ๐‘ฅ, ฮจ โˆ— (๐‘ˆ) ๐‘—=1 ๐‘ˆ time ๐‘ข 1 ๐‘ข 3 ๐‘ข 2 ๐‘ข Jacob ob Likelihood: โ„Ž โˆ— ๐‘ข 1 โ„Ž โˆ— ๐‘ข 2 โ„Ž โˆ— ๐‘ข 3 โ„Ž โˆ— ๐‘ข exp โˆ’ โ„Ž โˆ— ๐œ ๐‘’๐œ ๐‘ˆ 0 ๐‘ฅ, ๐œš โˆ— ๐‘ข 1 ๐‘ฅ, ๐œš โˆ— ๐‘ข 3 ๐‘ˆ ๐‘ฅ, ๐œš โˆ— (๐œ) ๐‘’๐œ exp โˆ’ 0 ๐‘ฅ, ๐œš โˆ— ๐‘ข 2 ๐‘ฅ, ๐œš โˆ— ๐‘ข 64

  33. Challenge in optimization problem ๐‘ˆ time David vid ๐‘ข 1 ๐‘ข 2 ๐‘ข 3 โ€ฆ ๐‘ข ๐‘— ๐‘ข ๐‘› Negative log-likelihood ๐‘› ๐‘œ ๐‘ฅ, ฮจ โˆ— (๐‘ˆ) โˆ’ log ๐‘ฅ, ๐œš โˆ— (๐‘ข ๐‘— ) min + ๐œ‡ ๐‘ฅ 1 wโˆˆโ„ + ๐‘—=1 Existing first order methods Existing first order methods log ๐‘ฆ 1 1 ๐‘ƒ ๐‘ƒ ๐œ— 2 iterations ๐œ— 2 iterations ฮ”๐‘ง Non-Lipschitz ฮ”๐‘ฆ 65

  34. Saddle point reformulation ๐‘ˆ time David vid ๐‘ข 1 ๐‘ข 2 ๐‘ข 3 โ€ฆ ๐‘ข ๐‘— ๐‘ข ๐‘› Negative log-likelihood ๐‘› ๐‘œ ๐‘ฅ, ฮจ โˆ— (๐‘ˆ) โˆ’ log ๐‘ฅ, ๐œš โˆ— (๐‘ข ๐‘— ) min + ๐œ‡ ๐‘ฅ 1 wโˆˆโ„ + ๐‘—=1 Fenchel dual ๐‘ค ๐‘— >0 ๐‘ค ๐‘— ๐‘ฅ, ๐œš โˆ— (๐‘ข ๐‘— ) โˆ’ log ๐‘ค ๐‘— โˆ’ 1 max ๐‘› ๐‘› ๐‘ฅ, ฮจ โˆ— (๐‘ˆ) โˆ’ ๐‘ค ๐‘— ๐‘ฅ, ๐œš โˆ— ๐‘ข ๐‘— min ๐‘œ max + log ๐‘ค ๐‘— + ๐œ‡ ๐‘ฅ 1 ๐‘› wโˆˆโ„ + ๐‘ค ๐‘— >0 ๐‘—=1 ๐‘—=1 ๐‘—=1 He et al. Arxiv 2016 66

  35. Proximal gradient ๐‘ฅ ๐‘ค ๐‘— ๐‘˜ (๐‘ฅ ๐‘˜ , ๐‘ค ๐‘— ) ๐‘ข , ๐‘ค ๐‘— ๐‘ข ) (๐‘ฅ ๐‘˜ ๐‘ข โˆ’ ๐›ฟ ๐›ผ ๐‘ข โˆ’ ๐›ฟ ๐›ผ ๐‘ฅ ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘˜ โˆ’ ๐œ‡๐›ฟ + ๐‘˜ โˆ’ ๐œ‡๐›ฟ + ๐‘ข ๐‘ข ๐‘ฅ ๐‘˜ ๐‘€ ๐‘ฅ ๐‘ข , ๐‘ค ๐‘— ๐‘ฅ ๐‘˜ ๐‘€ ๐‘ฅ ๐‘ข , ๐‘ค ๐‘— ๐‘ฅ ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘˜ ๐‘˜ 2 + 4๐›ฟ 2 + 4๐›ฟ 1/2 1/2 ๐‘— = ๐‘ค ๐‘— + ๐‘ค ๐‘— ๐‘— = ๐‘ค ๐‘— + ๐‘ค ๐‘— ๐‘ข + ๐›ฟ ๐›ผ ๐‘ข + ๐›ฟ ๐›ผ ๐‘ข ๐‘ข ๐‘ค ๐‘— ๐‘€ ๐‘ฅ ๐‘ข , ๐‘ค ๐‘— ๐‘ค ๐‘— ๐‘€ ๐‘ฅ ๐‘ข , ๐‘ค ๐‘— ๐‘ค ๐‘— = ๐‘ค ๐‘— ๐‘ค ๐‘— = ๐‘ค ๐‘— ๐‘ค ๐‘ค 2 2 ๐‘ข } ๐‘ข } Given current ๐‘ฅ ๐‘ข , {๐‘ค ๐‘— Given current ๐‘ฅ ๐‘ข , {๐‘ค ๐‘— ๐‘› ๐‘› ๐‘ฅ, ฮจ โˆ— (๐‘ˆ) โˆ’ ๐‘ค ๐‘— ๐‘ฅ, ๐œš โˆ— ๐‘ข ๐‘— min ๐‘œ max + log ๐‘ค ๐‘— + ๐œ‡ ๐‘ฅ 1 ๐‘› wโˆˆโ„ + ๐‘ค ๐‘— >0 ๐‘—=1 ๐‘—=1 ๐‘—=1 Bilinear form ๐‘€ ๐‘ฅ, ๐‘ค ๐‘— 67

  36. Accelerated proximal gradient ๐‘ฅ ๐‘ค ๐‘— ๐‘˜ ๐‘ฅ ๐‘ข+1 = ๐‘ฅ ๐‘ฅ ๐‘ข+1 = ๐‘ฅ ๐‘ข+1 , ๐‘ค ๐‘— ๐‘ข+1 ) ๐‘ข โˆ’ ๐›ฟ ๐›ผ ๐‘ข โˆ’ ๐›ฟ ๐›ผ ๐‘˜ โˆ’ ๐œ‡๐›ฟ + ๐‘˜ โˆ’ ๐œ‡๐›ฟ + (๐‘ฅ ๐‘ฅ ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘ฅ ๐‘˜ ๐‘€ ๐‘ฅ, ๐‘ค ๐‘— ๐‘ฅ ๐‘˜ ๐‘€ ๐‘ฅ , ๐‘ค ๐‘— ๐‘˜ ๐‘˜ ๐‘˜ 2 + 4๐›ฟ 2 + 4๐›ฟ 1/2 1/2 (๐‘ฅ ๐‘˜ , ๐‘ค ๐‘— ) ๐‘ข+1 = ๐‘ค ๐‘— + ๐‘ค ๐‘— ๐‘ข+1 = ๐‘ค ๐‘— + ๐‘ค ๐‘— ๐‘ข + ๐›ฟ ๐›ผ ๐‘ข + ๐›ฟ ๐›ผ ๐‘ค ๐‘— ๐‘ค ๐‘— ๐‘ค ๐‘— = ๐‘ค ๐‘— ๐‘ค ๐‘— = ๐‘ค ๐‘— ๐‘ค ๐‘— ๐‘€ ๐‘ฅ ๐‘ค ๐‘— ๐‘€ ๐‘ฅ, ๐‘ค ๐‘— , ๐‘ค ๐‘— ๐‘ข , ๐‘ค ๐‘— ๐‘ข ) (๐‘ฅ 2 2 ๐‘˜ ๐‘ข โˆ’ ๐›ฟ ๐›ผ ๐‘ข โˆ’ ๐›ฟ ๐›ผ ๐‘ฅ ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘˜ โˆ’ ๐œ‡๐›ฟ + ๐‘˜ โˆ’ ๐œ‡๐›ฟ + ๐‘ข ๐‘ข ๐‘ฅ ๐‘˜ ๐‘€ ๐‘ฅ ๐‘ข , ๐‘ค ๐‘— ๐‘ฅ ๐‘˜ ๐‘€ ๐‘ฅ ๐‘ข , ๐‘ค ๐‘— ๐‘ฅ ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘˜ = ๐‘ฅ ๐‘˜ ๐‘˜ 2 + 4๐›ฟ 2 + 4๐›ฟ 1/2 1/2 ๐‘— = ๐‘ค ๐‘— + ๐‘ค ๐‘— ๐‘— = ๐‘ค ๐‘— + ๐‘ค ๐‘— ๐‘ข + ๐›ฟ ๐›ผ ๐‘ข + ๐›ฟ ๐›ผ ๐‘ข ๐‘ข ๐‘ค ๐‘— ๐‘€ ๐‘ฅ ๐‘ข , ๐‘ค ๐‘— ๐‘ค ๐‘— ๐‘€ ๐‘ฅ ๐‘ข , ๐‘ค ๐‘— ๐‘ค ๐‘— = ๐‘ค ๐‘— ๐‘ค ๐‘— = ๐‘ค ๐‘— ๐‘ค ๐‘ค 2 2 1 1 ๐‘ƒ ๐‘ƒ ๐œ— iterations ๐œ— iterations ๐‘ข } ๐‘ข } Given current ๐‘ฅ ๐‘ข , {๐‘ค ๐‘— Given current ๐‘ฅ ๐‘ข , {๐‘ค ๐‘— ๐‘› ๐‘› ๐‘ฅ, ฮจ โˆ— (๐‘ˆ) โˆ’ ๐‘ค ๐‘— ๐‘ฅ, ๐œš โˆ— ๐‘ข ๐‘— min ๐‘œ max + log ๐‘ค ๐‘— + ๐œ‡ ๐‘ฅ 1 ๐‘› wโˆˆโ„ + ๐‘ค ๐‘— >0 ๐‘—=1 ๐‘—=1 ๐‘—=1 Bilinear form ๐‘€ ๐‘ฅ, ๐‘ค ๐‘— 68

  37. Converge much faster Accelerated gradient Unaccelerated 69

  38. Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence Inference: Time-Sensitive Recommendation 70

  39. Collaborative dynamics ๐‘ 1 ๐‘ 2 ๐‘ 3 ๐‘ 4 David vid Sophie hie Next item prediction Next item prediction What next item David will buy? What next item David will buy? โ„Ž ๐ธ๐‘โˆ— (๐‘ข) โ„Ž ๐ธ๐‘โˆ— (๐‘ข) max max ๐‘ ๐‘ Christine ine Return time prediction Return time prediction When will David buy the item? When will David buy the item? โˆž โˆž Jacob ob max ๐œ๐‘” ๐ธ๐‘โˆ— ๐œ ๐‘’๐œ max ๐œ๐‘” ๐ธ๐‘โˆ— ๐œ ๐‘’๐œ ๐‘ข ๐‘ข Nan et al. NIPS 2015 71

  40. Music recommendation for Last.fm Online records of music listening. The time unit is hour 1000 users, 3000 albums 20,000 observed pairs, more than 1 million events Album prediction Returning time prediction 72

  41. Electronic healthcare records MIMIC II dataset: a collection of de-identified clinical visit records The time unit is week 650 patients and 204 disease codes Diagnosis code prediction Returning time prediction 73

  42. Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence Inference: Influence Maximization 74

  43. Inference in dea adoption D S means 1:18 pm Influence estimation Influence estimation S follows D โ€ฆ Can a piece of news spread, in 1 month, Can a piece of news spread, in 1 month, David vid to a million user? to a million user? 1:30 pm ๐œ ๐‘ก, ๐‘ข : = ๐”ฝ ๐‘‚ ๐‘— (๐‘ข) ๐œ ๐‘ก, ๐‘ข : = ๐”ฝ ๐‘‚ ๐‘— (๐‘ข) โ€ฆ Sophie hie ๐‘—โˆˆ๐‘Š ๐‘—โˆˆ๐‘Š Influence maximization Influence maximization Who is the most influential user? Who is the most influential user? 2:00 pm max max ๐‘กโˆˆ๐‘Š ๐œ ๐‘ก, ๐‘ข ๐‘กโˆˆ๐‘Š ๐œ ๐‘ก, ๐‘ข โ€ฆ Christine ine Bob Source localization Source localization Where is the origin of information? Where is the origin of information? ๐‘กโˆˆ๐‘Š,๐‘ขโˆˆ[0,๐‘ˆ] Likelihood partial cascade ๐‘กโˆˆ๐‘Š,๐‘ขโˆˆ[0,๐‘ˆ] Likelihood partial cascade max max Jacob ob Rodriguez et al. ICML 2012 Nan et al. NIPS 2013 75 Farajtabar et al. AISTATS 2015

  44. Cascades from D in 1 month David vid David vid David vid Sophie hie Sophie hie Sophie hie Bob Christine ine Jacob ob Bob Christine ine Bob Christine ine Jacob ob 76 Jacob ob

  45. Cascades from D in 1 month ๐‘‚ ๐‘— ๐‘ข = 4 ๐‘‚ ๐‘— ๐‘ข = 2 ๐‘‚ ๐‘— ๐‘ข = 3 ๐‘—โˆˆ๐‘Š ๐‘—โˆˆ๐‘Š ๐‘—โˆˆ๐‘Š David vid David vid David vid Sophie hie Sophie hie Sophie hie Bob Christine ine Jacob ob ๐œ ๐ธ, ๐‘ข Bob โ‰ˆ 4 + 2 + 3 Christine ine Bob Christine ine 3 = 3 Jacob ob 77 Jacob ob

  46. Cascades from B in 1 month David vid David vid David vid Sophie hie Sophie hie Sophie hie Bob Christine ine Jacob ob Bob Christine ine Bob Christine ine Jacob ob 78 Jacob ob

  47. Cascades from B in 1 month ๐‘‚ ๐‘— ๐‘ข = 4 ๐‘‚ ๐‘— ๐‘ข = 2 ๐‘‚ ๐‘— ๐‘ข = 2 ๐‘—โˆˆ๐‘Š ๐‘—โˆˆ๐‘Š ๐‘—โˆˆ๐‘Š David vid David vid David vid Sophie hie Sophie hie Sophie hie Bob Christine ine Jacob ob ๐œ ๐ถ, ๐‘ข Bob โ‰ˆ 4 + 2 + 2 Christine ine Bob Christine ine 3 = 2.67 Jacob ob 79 Jacob ob

  48. Find most influential user max max ๐‘กโˆˆ๐‘Š ๐œ ๐‘ก, ๐‘ข ๐‘กโˆˆ๐‘Š ๐œ ๐‘ก, ๐‘ข ๐‘ƒ ๐‘ž ๐‘Š ๐‘Š + |๐น| ๐‘ƒ ๐‘› ๐‘Š 2 + ๐‘› ๐น |๐‘Š| ๐‘› ๐‘ƒ ๐‘Š 2 + ๐น |๐‘Š| Each graph ๐‘ƒ David vid David vid David vid Soph phie ie Sophie phie Soph phie ie Bob Bob Chris istin tine Jacob cob Bob Bob Chris istin tine Bob Bob Chris istin tine 80 Jacob cob Jacob cob

  49. Find most influential user max max ๐‘กโˆˆ๐‘Š ๐œ ๐‘ก, ๐‘ข ๐‘กโˆˆ๐‘Š ๐œ ๐‘ก, ๐‘ข ๐‘ƒ ๐‘ž ๐‘Š ๐‘Š + |๐น| ๐‘ƒ ๐‘› ๐‘Š 2 + ๐‘› ๐น |๐‘Š| ๐‘› ๐‘ƒ ๐‘Š 2 + ๐น |๐‘Š| Each graph Each node ๐‘ƒ David vid David vid David vid Soph phie ie Sophie phie Soph phie ie Bob Bob Chris istin tine Jacob cob Bob Bob Chris istin tine Bob Bob Chris istin tine 81 Jacob cob Jacob cob

  50. Find most influential user Quadratic in |๐‘Š| Quadratic in |๐‘Š| max max ๐‘กโˆˆ๐‘Š ๐œ ๐‘ก, ๐‘ข ๐‘กโˆˆ๐‘Š ๐œ ๐‘ก, ๐‘ข ๐‘ƒ ๐‘ž ๐‘Š ๐‘Š + |๐น| not scalable! not scalable! ๐‘ƒ ๐‘› ๐‘Š 2 + ๐‘› ๐น |๐‘Š| ๐‘› ๐‘ƒ ๐‘Š 2 + ๐น |๐‘Š| Each graph Each node Single source shortest path ๐‘ƒ David vid David vid David vid Soph phie ie Sophie phie Soph phie ie Bob Bob Chris istin tine Jacob cob Bob Bob Chris istin tine Bob Bob Chris istin tine 82 Jacob cob Jacob cob

  51. Randomized neighborhood estimation ๐‘  โˆผ exp(โˆ’๐‘ ) Linear in # of Linear in # of nodes and edges nodes and edges David vid 2.75 ๐‘† ๐ธ = 0.29 Sophie hie 1.38 David id 0.29 ๐‘† ๐‘‡ = 0.29 Bob Sophi phie ๐‘† ๐ถ = 0.29 Bob Jacob ob 1.26 ๐‘† ๐พ = 1.26 Jacob ob ๐‘† ๐ท = 0.33 0.33 Christine ine Chris istine tine 83

  52. Randomized neighborhood estimation ๐‘› โˆ’ 1 ๐‘› โˆ’ 1 ๐‘  โˆผ exp(โˆ’๐‘ ) ๐œ ๐‘ก, ๐‘ข โ‰ˆ ๐œ ๐‘ก, ๐‘ข โ‰ˆ ๐‘› ๐‘› ๐‘† ๐‘ก (๐‘—) ๐‘† ๐‘ก (๐‘—) ๐‘—=1 ๐‘—=1 David vid 0.23 ๐‘† ๐ธ = 0.29, 0.23 Sophie hie 0.32 David id 1.97 ๐‘† ๐‘‡ = 0.29, 0.23 Bob Sophi phie ๐‘† ๐ถ = 0.29, 0.23 Bob Jacob ob 0.37 ๐‘† ๐พ = 1.26, 0.37 Given ๐‘› iid samples, ๐‘  โˆผ ๐‘“ โˆ’๐‘  , Given ๐‘› iid samples, ๐‘  โˆผ ๐‘“ โˆ’๐‘  , Jacob ob their minimum ๐‘  their minimum ๐‘  โˆ— is distributed as โˆ— is distributed as ๐‘† ๐ท = 0.33, 3.70 โˆ— โˆผ ๐‘›๐‘“ โˆ’๐‘›๐‘  โˆ— โˆผ ๐‘›๐‘“ โˆ’๐‘›๐‘  ๐‘  ๐‘  3.70 Christine ine Chris istine tine 84

  53. Computational complexity ๐‘ž ๐‘ž ๐‘ƒ ๐‘ž ๐‘› ๐‘Š + ๐‘Š + ๐น ๐œ ๐‘ก, ๐‘ข โ‰ˆ 1 ๐œ ๐‘ก, ๐‘ข โ‰ˆ 1 ๐‘› โˆ’ 1 ๐‘› โˆ’ 1 ๐‘ž ๐‘ž ๐‘› ๐‘› ๐‘ก (๐‘—) ๐‘ก (๐‘—) ๐‘† ๐‘† ๐‘—=1 ๐‘—=1 ๐‘˜ ๐‘˜ ๐‘˜=1 ๐‘˜=1 Each graph Each random Each node Breadth first label set search David vid David vid David vid Soph phie ie Sophie phie Soph phie ie Bob Bob Chris istin tine Jacob cob Bob Bob Chris istin tine Bob Bob Chris istin tine 85 Jacob cob Jacob cob

  54. Scalability 86

  55. Ten most influential sites in a month Site Typ ype e of site digg.com popular news site lxer.com linux and open source news exopolitics.blogs.com political blog mac.softpedia.com mac news and rumors gettheflick.blogspot.com pictures blog urbanplanet.org urban enthusiasts givemeaning.blogspot.com political blog talkgreen.ca environmental protection blog curriki.org educational site pcworld.com technology news 87

  56. Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence More Advanced Models 88

  57. Nan et al. AISTATS 203 Joint models with rich context Nan et al. KDD 2015 Audio Text time 0 ๐‘ข 1 ๐‘ข 2 โ€ฆ โ€ฆ ๐‘ข ๐‘› ๐‘ข ๐‘— ๐‘ˆ Image Other simultaneously measured time-series 89

  58. Spatial temporal processes influenza spread bird migration Crime Smart city 90

  59. Continuous-time document streams Time Nan et al. KDD 2015 91

  60. Dirichlet-Hawkes processes Dirichlet Recurrent Hawkes Chinese Restaurant Process Process โ„Ž ๐‘™ (๐‘ข ๐‘œ ) ๐›ฝ ๐œ„ ๐‘œ |๐œ„ 1:๐‘œโˆ’1 โˆผ + ๐›ฝ ๐œ€ ๐œ„ ๐‘™ + + ๐›ฝ ๐ป 0 ๐œ„ โ„Ž ๐‘™ โ€ฒ (๐‘ข ๐‘œ ) โ„Ž ๐‘™ โ€ฒ (๐‘ข ๐‘œ ) ๐‘™ โ€ฒ ๐‘™ โ€ฒ ๐‘™ 92

  61. Dark Knight vs. Endeavour Triggering Kernel Temporal Dynamics 93

  62. Previous models are parametric Each parametric form encodes our prior knowledge Poisson Process Hawkes Process Self-Correcting Process Autoregressive Conditional Duration Process Limitations Model may be misspecified Hard to encode complex features or markers Hard to encode dependence structure Can we Ca we learn a mo more exp xpressive ssive mo mode del of ma marked ked temp mporal al po point t pr processes esses ? 94

  63. Recurrent Marked Temporal Point Processes Recurrent neural network + Marked temporal point processes hidden vector of RNN learns a nonlinear dependency over both past ti time and marker ers general conditional density multinomial distribution of the next timing for the markers 95

  64. Experiments: synthetic Time Prediction Intensity Function Prediction Error ACD Hawkes Self-Correcting 96

  65. Experiments: real world data NYC Taxi Trading Stackoverflow MIMIC-II Time Prediction Marker Prediction 97

  66. A unified framework Representation P ROBABILISTIC M ODELS 1. Intensity function and 2. Basic building blocks L EARNING M ETHODS 3. Superposition to Modeling 1. Idea adoption understand 2. Network coevolution predict 3. Collaborative dynamics control Learning 1. Sparse hidden diffusion networks P ROCESSES & A CTIVITY 2. Low rank collaborative dynamics over 3. Generic algorithm S OCIAL & I NFORMATION Inference N ETWORKS 1. Time-sensitive recommendation 2. Scalable Influence estimation 98

  67. Introduction to PtPack A C++ ++ Mu Multivariate variate Tem empor oral al Poi oint nt Proce ocess ss Packa ackage ge

  68. Features Learning sparse interdependency structure of continuous-time information diffusions Scalable continuous-time influence estimation and maximization Learning multivariate Hawkes processes with different structural constraints, like: sparse, low-rank, customized triggering kernels Learning low-rank Hawkes processes for time-sensitive recommendations Efficient simulation of standard multivariate Hawkes processes Learning multivariate self-correcting processes Simulation of customized general temporal point processes Basic residual analysis and model checking of customized temporal point processes Visualization of triggering kernels, intensity functions, and simulated events 100

Recommend


More recommend