a quest to understand the organisation of large
play

A QUEST TO UNDERSTAND THE ORGANISATION OF LARGE RELATIONAL DATA: - PowerPoint PPT Presentation

A QUEST TO UNDERSTAND THE ORGANISATION OF LARGE RELATIONAL DATA: FROM WILD GOATS TO BITCOINS Rmy Cazabet INTRODUCTION: THE QUEST QUEST Data coming from the real world Human/Animal/Natural activity Complex Systems Many


  1. A QUEST TO UNDERSTAND THE ORGANISATION OF LARGE RELATIONAL DATA: FROM WILD GOATS TO BITCOINS Rémy Cazabet

  2. INTRODUCTION: THE QUEST

  3. QUEST • Data coming from the real world ‣ Human/Animal/Natural activity • Complex Systems ‣ “Many entities in interaction” ‣ “The whole is more than the sum of its parts” (…?…) ‣ The system is not understandable by reductionism: understanding each part very well is not enough to understand how the system works

  4. QUEST • Data coming from the real world ‣ Human/Animal/Natural activity • Complex Systems ‣ “Many entities in interaction” ‣ “The whole is more than the sum of its parts” (…?…) ‣ The system is not understandable by reductionism: understanding each part very well is not enough to understand how the system works Note : why “understand” ? ‣ Goal in itself (physics, sociology, biology, (CS ?)…) ‣ Understanding => building good models => predict, detect “exceptions”, …

  5. TOOL: COMPLEX NETWORKS • Entities in relations/interaction: ‣ Individuals exchange information/money/physical things ‣ Genes/Proteins/Cells interact through known or unknown means ‣ Web pages/articles/Patents… reference each other ‣ Individuals/animals/things belong to same groups/have common traits ‣ … • => Entities: nodes • =>Relations: edges • With/Without attributes (categories, numeric, time, …)

  6. TOOL: COMPLEX NETWORKS

  7. TOOL: COMPLEX NETWORKS • Networks are interesting for their structure , their organisation ‣ Neighbours of my neighbours are also my neighbours ? ‣ Individuals with same attributes than me are more likely to be my Nb. ? ‣ There are “dense groups” (communities) ? ‣ Some nodes are more “strategically” positioned ? ‣ … • Objective: Understand/discover/analyse/reproduce this structure

  8. CHAPTER ONE : WHAT I’VE DONE

  9. SCIENTIFIC JOURNEY • PhD : Toulouse, Dynamic Community Detection in Temporal networks • Postdocs: ‣ Tokyo (2y), Understanding cooperation in social media ‣ ENS de Lyon (1y), Understanding usages of Bicycle Sharing Systems ‣ Paris (1y), Fraud detection in crypto-currencies

  10. IZARDS (WILD GOATS) • Social animals • 20y of observations ‣ (Position/co-location) • Persistence of groups ? 1994 1995 1996 1997 1998 1999 2001 2002 2003 2004 2005 2006 2007 • Despite deaths/climate change ?

  11. FACEBOOK • Can we discover your “social circles” from your ego- networks ? • How do you like it ?

  12. TRENDING TOPICS

  13. TRENDING TOPICS ´ Ev´ enement d´ etect´ e Date de cr´ eation Date de fin Date de sortie D´ elai de d´ etection (j) Devil May Cry 02/12/2007 08/08/2008 31/01/2008 -60 Fable 2 06/12/2008 03/02/2009 18/12/2008 -12 Gears Of War 2 14/10/2008 29/12/2008 07/11/2008 -24 Assassin’s Creed 25/01/2008 26/02/2008 31/01/2008 -6 Soul Calibur IV 07/07/2008 15/11/2008 31/07/2008 -24 Uncharted 11/11/2007 02/01/2008 16/11/2007 -5 2007 2008 2009 METAL GEAR JEU DESARMEMENT METAL GEAR SOLID VIDEO DE JEU

  14. SPACE-CORRECTED COMMUNITIES Normal community detection

  15. SPACE-CORRECTED COMMUNITIES Spatially corrected communities

  16. USER IDENTIFICATION IN BITCOIN Ground 859718 5467309 5005079 4975459 4952060 truth 4351029 3596858 3142946 3104470 2450702 2373452 2272939 1164699 1125389 70 81113 667033 619957 573705 551132 paytunia 540648 511932 coinbase 490726 easywallet 339363 310121 strongcoin 5453832 easycoin 4641355 btc faucet 3525055 3327158 instawallet 2594636 2170323 1081887 flexcoin 52 3708232 3017504 317 2913748 221533 1382255 4888339 377177 3211606 2060685 1383742 2 5053363 2523225 2213276 136 Without community detection

  17. USER IDENTIFICATION IN BITCOIN GT H4-l2 H1 859718 5467309 4952060 3104470 2272939 5005079 4351029 3596858 3142946 1164699 1125389 70 5453832 4641355 3327158 34076 2170323 48774 52 82461 18993 81113 138756 667033 619957 paytunia 53655 573705 107506 easywallet 551132 540648 coinbase 182296 511932 strongcoin 90460 4975459 68195 easycoin 339363 195281 btc faucet 2594636 170183 310121 flexcoin 23616 2450702 139285 2373452 91473 instawallet 1081887 146623 3708232 145491 3017504 317 2913748 149568 1382255 221533 2060685 5053363 2523225 2213276 490726 377177 3525055 3211606 1383742 4888339 136 2 With community detection

  18. DYNAMIC COMMUNITY DETECTION • “Community Discovery in Dynamic Networks: A Survey” • With Giulio Rossetti (Pisa) • 50 methods, 40-60 pages • To be (Should be) published in ACM Computer Surveys (Slooooow)

  19. TWITTER IN TIME OF CRISIS 12 Normalized Retweet Count 10 8 6 4 2 0 6th Mrach 7th March 8th March 9th March 10th March 10th March 11th March 12th March 13th March 14th March 15th March 16th March 17th March 18th March 19th March 20th March 21st March 21st March 22nd March 23rd March 24th March Time (per hour) IS only AMP only Mixed

  20. MASSIVE PEER COOPERATION PROCESSES

  21. MASSIVE PEER COOPERATION PROCESSES

  22. MASSIVE PEER COOPERATION PROCESSES

  23. MASSIVE PEER COOPERATION PROCESSES

  24. MASSIVE PEER COOPERATION PROCESSES Simple variant 1.00 Complex variant LI Exploiting creation 0.75 cumulativeFrequency GI Type views 0.50 references LI AGG 0.25 0.00 AGG 1e+01 1e+03 1e+05 userRank AGG BB Color Key AGG 0 0.2 0.6 Value categories SINGING 1.00 CG3D DANCE 2nd category NOCATEGORY MASHUPS DANCE MUSIC Fraction of famous videos MAD 0.75 MUSICALPERFORMANCE MAD MOVIE ORIGINALMUSIC MASHUPS ANIMATION frequency VOICE OTHER MUSICALPERFORMANCE PICTURE 0.50 VOCALOIDVOICE ORIGINALMUSIC SINGING CG3D DANCE NOCATEGORY MASHUPS MUSIC MAD MUSICALPERFORMANCE MOVIE ORIGINALMUSIC ANIMATION VOICE OTHER PICTURE VOCALOIDVOICE PICTURE 0.25 SINGING VOCALOIDVOICE VOICE 0.00 0 20 40 60 user

  25. TEMPORAL PROFILES EVOLUTION NMF : extract temporal profiles 35000 35000 30000 30000 25000 25000 20000 20000 15000 15000 10000 10000 5000 5000 0 0 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 TURSDAY FRIDAY SATURDAY SUNDAY TURSDAY FRIDAY SUNDAY MONDAY TUESDAY WEDNESDAY SATURDAY MONDAY TUESDAY WEDNESDAY “Commercial” “Work” 25000 20000 18000 20000 16000 14000 15000 12000 10000 10000 8000 6000 5000 4000 2000 0 0 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 TURSDAY FRIDAY SATURDAY SUNDAY TURSDAY FRIDAY SATURDAY SUNDAY TUESDAY WEDNESDAY TUESDAY WEDNESDAY MONDAY MONDAY “Bars-Restaurants (?)” “Leisure” 25

  26. Main commercial street Main train station Main city Mall (c) TPU3 Main campuses of universities 26

  27. CHAPTER 2 : WHAT I’M DOING NOW

  28. CHAPTER 2 : WHAT I’M DOING NOW (Struggling)

  29. CHAPTER 2 : WHAT I’M DOING NOW (Struggling) (Trying to get fundings)

  30. DYNAMIC COMMUNITY DETECTION: EMPIRICAL EVALUATION • Survey : classification, qualitative comparison • Empirical evaluation => strengths, weaknesses, …

  31. CHAPTER 3: WHAT’S NEXT

  32. WHAT’S NEXT • I’m open to all opportunities • There are “theoretical” questions I would like to explore: ‣ Community Detection —VS— Clustering ‣ Finding automatically the best network model - Communities? - Spatial? - Embedding? (many works now in ML/Data Mining) - Core Periphery? - … - =>Multi-criteria analysis/optimisation : Model cost (information theory) VS model accuracy

  33. THANK YOU ! QUESTIONS WELCOME

Recommend


More recommend