network community detection in practical scenarios
play

NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj - PowerPoint PPT Presentation

NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj University of Ljubljana Faculty of Computer and Information Science ARS 15 NETWORK COMMUNITY STRUCTURE Cross Five Tennessee 17 Louisiana State Vau Kentucky Georgia


  1. NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro ˇ Subelj University of Ljubljana Faculty of Computer and Information Science ARS ’15

  2. NETWORK COMMUNITY STRUCTURE Cross Five Tennessee 17 Louisiana State Vau Kentucky Georgia Florida 16 Memphis Southern Mississippi 23 Mississippi State Georgia Tech Louisville Trigger MN60 Wake Forest Clemson Vanderbilt 15 South Carolina Arkansas Alabama Birmingham SMN5 Mississippi MN83 6 31 CCL Zap Notch Duke East Carolina Maryland Tulane 7 19 Haecksel Florida State Louisiana Monroe Houston MN105 Topless Mus Virginia Jonah Quasi North Carolina State Auburn Alabama Louisiana Lafayette Army Patchback Connecticut Middle Tennessee State 11 Central Florida North Carolina Cincinnati 9 21 Double Number1 Oklahoma State 33 SN89 MN23 Arkansas State Kansas TR99 SN100 Virginia Tech Fork New Mexico State Missouri TSN103 SN9 Syracuse Navy Grin Boston College Oklahoma 5 Scabs Beescratch Temple Louisiana Tech Baylor Texas 14 10 34 SN4 Jet West Virginia Southern Methodist North Texas 30 TR82 Eastern Michigan Ball State Kansas State 2 Stripes Texas Tech Kringel Beak DN63 SN90 Rutgers 20 Hook Northern Illinois Brigham Young Iowa State Tulsa Oscar Buffalo Indiana Nebraska Texas A&M Web Akron Pittsburgh Miami Ohio Miami Florida Boise State 18 1 Fish Upbang Texas Christian Rice Texas ElPaso TR120 Shmuddel Nevada Las Vegas 27 SN63 Northwestern Minnesota Colorado 3 Notre Dame Air Force Utah State TR77 Gallatin Feather Marshall Toledo Wyoming Hawaii SN96 Knit Ohio Central Michigan Idaho 24 DN21 Bowling Green State Iowa 22 TSN83 Michigan State New Mexico San Jose State 29 Nevada 32 Zipfel Thumper PL Western Michigan Wisconsin Kent Penn State Illinois Fresno State 8 4 TR88 DN16 Purdue Colorado State Michigan 28 Utah Bumper UCLA Whitetip Ripplefluke Ohio State San Diego State 12 Stanford Wave Southern California Washington 13 Washington State Oregon 26 Arizona State Zig California 25 Oregon State Arizona Karate club Bottlenose dolphins American football Synthetic graph Random graph Communities are cohesive subgroups of sparse networks.

  3. NETWORK COMMUNITY DETECTION • graph partitioning, • hierarchical clustering, • modularity optimization, • statistical inference, • spectral methods, • map equation, • dynamics etc. Girvan, M. & Newman, M. E. J., P. Natl. Acad. Sci. USA 99 , 7821–7826 (2002). Fortunato, S., Phys. Rep. 486 , 75–174 (2010).

  4. LABEL PROPAGATION ALGORITHM 17 16 23 15 6 31 19 7 11 9 21 33 5 14 34 10 30 2 20 18 1 27 3 24 22 32 29 4 8 28 12 13 26 25 Raghavan, U. N., Albert, R. & Kumara, S., Phys. Rev. E 76 , 036106 (2007). ˇ Subelj, L. & Bajec, M., Phys. Rev. E 83 , 036103 (2011) etc.

  5. LARGE-SCALE COMMUNITY DETECTION Hric, D., Darst, R. K. & Fortunato, S., Phys. Rev. E 90 , 062805 (2014).

  6. COMMUNITY DETECTION TASKS EXPLORATORY TASK PREDICTIVE TASK TRAINING SET TEST SET TRAINING SET TEST SET

  7. APS & WIKILEAKS NETWORKS APS WikiLeaks DATA 1893-2013 1966-2010 citation reference 526,527 papers 52,416 cables NETWORK 5,989,263 citations 78,506 references 12 journals 3 privacy levels CLUSTERS 301 sections 263 embassies 14 algorithms 26 algorithms SETTING 1893-2012 1966-2009 TRAINING 2013 (4%) 2010 (17%) TEST Non-overlapping and cohesive ground truth clusters.

  8. APS & WIKILEAKS RESULTS NORMALIZED MUTUAL INFORMATION EXPLORATORY TASK 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 12 journals 301 sections 3 privacy levels 263 embassies PREDICTIVE TASK CLASSIFICATION ACCURACY 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 12 journals 301 sections 3 privacy levels 263 embassies SPEARMAN CORRELATION TASK CORRELATION 1 0.75 0.5 0.25 0 −0.25 −0.5 −0.75 −1 12 journals 301 sections 3 privacy levels 263 embassies

  9. YOUTUBE, DBLP & JAVA NETWORKS YouTube DBLP java DATA social collaboration software 39,841 users 317,080 authors 2,378 classes NETWORK 224,235 friends. 1,049,866 collabs. 14,619 depends. 12,986 groups 98,326 venues 54 packages CLUSTERS 14 algorithms 14 algorithms 26 algorithms SETTING leave-one-out leave-one-out leave-one-out TRAINING Overlapping or non-cohesive ground truth clusters.

  10. YOUTUBE, DBLP & JAVA RESULTS NORMALIZED MUTUAL INFORMATION EXPLORATORY TASK 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 12,986 categories 98,326 venues 54 packages PREDICTIVE TASK CLASSIFICATION ACCURACY 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 12,986 categories 98,326 venues 54 packages SPEARMAN CORRELATION TASK CORRELATION 1 0.75 0.5 0.25 0 −0.25 −0.5 −0.75 −1 12,986 categories 98,326 venues 54 packages

  11. COMMUNITY DETECTION IN PRACTICE Take-home message: • community information is useful in practice, • for lots of clusters, same algorithms for both tasks, • for few clusters, different algorithms for different tasks. Future work: • beyond majority classification, • overlapping and non-cohesive clusters, • descriptive, inferential, causal and mechanistic tasks.

  12. LOVRO ˇ SUBELJ lovro.subelj@fri.uni-lj.si http://lovro.lpt.fri.uni-lj.si

Recommend


More recommend