analysis of wide area user mobility patterns
play

Analysis of wide area user mobility patterns Kevin Simler*, Steven - PowerPoint PPT Presentation

Analysis of wide area user mobility patterns Kevin Simler*, Steven E. Czerwinski , Anthony Joseph UC Berkeley * Now at MIT 2004/12/02 Now at Google Motivation We want to understand user behavior In order to design better


  1. Analysis of wide area user mobility patterns Kevin Simler*, Steven E. Czerwinski † , Anthony Joseph UC Berkeley * Now at MIT 2004/12/02 † Now at Google

  2. Motivation � We want to understand user behavior � In order to design better systems � In order to generate synthetic traces � In order to model user behavior � How can we capture user presence in the wide area?

  3. Motivation � We want to understand user behavior � In order to design better systems � In order to generate synthetic traces � In order to model user behavior � How can we capture user presence in the wide area? web

  4. Motivation � We want to understand user behavior � In order to design better systems � In order to generate synthetic traces � In order to model user behavior � How can we capture user presence in the wide area? web, IM

  5. Motivation � We want to understand user behavior � In order to design better systems � In order to generate synthetic traces � In order to model user behavior � How can we capture user presence in the wide area? web, IM, …, e-mail

  6. Why e-mail? � E-mail is a widely-used service � User typically checks e-mail first � Berkeley provides IMAP + web front end � Any Internet connection → e-mail access � E-mail reflects users’ Internet presence

  7. Outline � Background � Analysis and results � User modeling � Future work � Summary

  8. Trace characteristics � 31-days (May 2003) � Server from UC Berkeley EECS dept. � Regular IMAP plus web front-end � 1004 active users, primarily: � Professors � Graduate students � Support staff � Tracked across different service providers

  9. Building on previous work � Wireless Campus Studies � Mobility on a campus � Single service provider with homogenous users � Tang & Baker, Kotz & Essien, Balazinska & Castro � Metricom WLAN � Mobility across/between cities � Single service provider with more diverse users � Tang & Baker

  10. Trace data � Each entry in the trace includes: � Timestamp (seconds) � Request type ( login , close , select , etc.) � Username � IP address

  11. Preprocessing � We want user behavior � Trace records client application behavior � Outlook, Eudora, Thunderbird, etc. � Primary difference: � Client polls for new e-mail at regular intervals � Fixed period per client, variable across clients

  12. We filter client polling using a Fourier transform Client connections from a single user: … client connection login logout

  13. We filter client polling using a Fourier transform p p … Use a Fourier transform to identify polling period p .

  14. We filter client polling using a Fourier transform … Identify sequence separated by p . Remove all but the first connection.

  15. We filter client polling using a Fourier transform > 15 minute gap … Clump connections into user sessions

  16. We filter client polling using a Fourier transform … user session user session

  17. We filter client polling using a Fourier transform … Now we have (roughly) a trace of user behavior

  18. Outline � Background � Trace analysis � Defining location � Daily mobility � Monthly mobility � Session activity � User modeling � Future work � Summary

  19. Defining network location � Connection used to access the Internet � E.g. a dialup ISP, campus wireless network � Approximated by a combination of � Authoritative DNS server � AS number � Subnet

  20. How mobile are users each day? Fraction of user-days 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 Number of locations

  21. How mobile are users each day? Fraction of user-days 0.6 50% of user- 0.5 days involve logging in from 0.4 only 1 location 0.3 0.2 0.1 0 0 1 2 3 Number of locations

  22. How mobile are users each day? Fraction of user-days 0.6 15% of user- 0.5 days involve logging in from 0.4 2 locations 0.3 0.2 0.1 0 0 1 2 3 Number of locations

  23. How mobile are users each day? Fraction of user-days 0.6 Upshot: On any 0.5 given day, users are not highly 0.4 mobile 0.3 0.2 0.1 0 0 1 2 3 Number of locations

  24. How mobile are users in 31 days? � How many unique subnets do they visit? � How many unique AS #s do they visit? Let’s look at a graph….

  25. How mobile are users in 31 days? 1 cumulative fraction of users subnets 0.8 AS #s 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 # clusters

  26. How mobile are users in 31 days? 1 cumulative fraction of users subnets 0.8 AS #s 0.6 80% of users 0.4 log in from 8 or 0.2 fewer unique 0 subnets 0 2 4 6 8 10 12 14 # clusters

  27. How mobile are users in 31 days? 1 cumulative fraction of users subnets 0.8 AS #s 0.6 90% of users 0.4 log in from 3 or 0.2 fewer unique 0 AS numbers 0 2 4 6 8 10 12 14 # clusters

  28. How mobile are users in 31 days? 1 cumulative fraction of users subnets 0.8 AS #s 0.6 Upshot: Again, 0.4 most users are 0.2 not highly 0 mobile 0 2 4 6 8 10 12 14 # clusters

  29. User activity at a location 0.7 fraction of visits 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4+ # sessions

  30. User activity at a location 0.7 60% of visits to fraction of visits 0.6 a location 0.5 result in only 1 0.4 session 0.3 0.2 0.1 0 1 2 3 4+ # sessions

  31. User activity at a location 0.7 20% of visits to fraction of visits 0.6 a location result 0.5 in exactly 2 0.4 sessions 0.3 0.2 0.1 0 1 2 3 4+ # sessions

  32. User activity at a location 0.7 Upshot: Users fraction of visits 0.6 access their e- 0.5 mail once or 0.4 twice per visit. 0.3 0.2 0.1 0 1 2 3 4+ # sessions

  33. Outline � Background � Trace analysis � User modeling � Categorizing users � Model structure � Training and testing � Future work � Summary

  34. Categorizing users � Based on number of primary locations � For a given user, a primary location is: � One where the user spends >5% of the time � Categories � Users with 1 primary location � Users with 2 primary locations � Users with 3+ primary locations

  35. Structure of our models � One model for each category � Two-tiered Markov model � High-level states represent user’s location � Low-level states represent user’s activity � Both MMs are 1 st order

  36. Model structure for category 2 � 2 primary locations + 1 traveling state primary 1 primary 2 traveling

  37. Model structure for category 2 � 2 primary locations + 1 traveling state primary 1 High-level (location) states primary 2 traveling

  38. Model structure for category 2 � 2 primary locations + 1 traveling state primary 1 Low-level (session) states primary 2 I.e. Logged-In and Logged-Out traveling

  39. Training � We have all the information � Which locations are primary � Where the user is, at any time � When the user is logged in/out � Simple to compute transition probabilities

  40. Testing methodology � Create synthetic trace � Chose metrics to measure a trace � Compare real trace with synthetic trace

  41. Testing one metric � # of sessions between visits to primary � Each user visits his primary � leaves to visit other locations � then comes back to his primary � Every time this happens, record the number of other locations � There will be a CDF for the entire trace (real or synthetic)

  42. Testing results

  43. Outline � Background � Trace analysis � User modeling � Future work � Summary

  44. Using the results � Synthetic traces can help test systems � User behavior has implications for design � E.g. focus resources on primary locations � Model can predict user behavior on-the-fly � E.g. to cache, or not to cache?

  45. As technology changes… � Blackberries � More physical locations � Shorter, more frequent sessions � Still, primary locations will be important � Wireless LAN hotspots � More network locations

  46. Outline � Background � Trace analysis � User modeling � Future work � Summary

  47. Summary – what we’ve done � Obtained a trace from an e-mail server � Filtered out client polling � Analyzed trace of user behavior � Modeled categories of users with tiered MM � Generated synthetic traces

  48. Summary – user behavior � Most users log in from 1 or 2 locations � But a few users are highly mobile � Users access e-mail infrequently, but for long periods of time

  49. Thank you � Quick clarifying questions?

Recommend


More recommend