linkedin network updates uncovered linkedin network
play

LinkedIn: Network Updates Uncovered LinkedIn: Network Updates - PowerPoint PPT Presentation

LinkedIn: Network Updates Uncovered LinkedIn: Network Updates Uncovered Ruslan Belkin Sean Dawson Agenda Agenda Quick T Quick Tour our Req equirements (User Experience / Infrastructure) uirements (User Experience / Infrastructure)


  1. LinkedIn: Network Updates Uncovered LinkedIn: Network Updates Uncovered Ruslan Belkin Sean Dawson

  2. Agenda Agenda • Quick T Quick Tour our • Req equirements (User Experience / Infrastructure) uirements (User Experience / Infrastructure) • Ser Service API vice API • Int Internal Ar ernal Archit chitecture ecture • Applications (e.g., T Applications (e.g., Twitt witter Int er Integration, Email Deliv egration, Email Deliver ery) y) • Measuring P Measuring Per erformance ormance • Shameless self pr Shameless self promo omotion tion

  3. The Stack The Stack Environment 90% Java 5% Groovy 2% Scala 2% Ruby 1% C++ Containers Tomcat, Jetty Data Layer Oracle, MySQL, Voldemort, Lucene, Memcache Offline Processing Hadoop Queuing ActiveMQ Frameworks Spring

  4. The Numbers The Numbers Updates Created 35M / week Update Emails 14M / week Service Calls 20M / day 230 / second

  5. Stream View Stream View

  6. Connection View Connection View

  7. Profile Profile

  8. Groups Groups

  9. Mobile Mobile

  10. Email Email NUS Email digest screensho NUS Email digest screenshot t

  11. HP without NUS HP without NUS

  12. Expectations – User Experience Expectations – User Experience • Multiple presentation vie Multiple presentation views ws • Comments on updat Comments on updates es • Aggregation of noisy updat ggregation of noisy updates es • Par artner Int tner Integration egration • Easy t Easy to add ne o add new updat w updates t es to the syst o the system em • Handles I1 Handles I18N and o 8N and other dynamic cont ther dynamic contexts xts • Long data re Long data retention ention

  13. Expectations - Infrastructure Expectations - Infrastructure • Large number of connections, f Large number of connections, follo ollower ers and gr s and groups oups • High req High request v uest volume + Lo olume + Low Lat w Latency ency • Random distribution lists Random distribution lists • Black/Whit Black/White lists, A/B t e lists, A/B testing, e esting, etc. tc. • Tenured st enured storage of updat orage of update hist e histor ory y • Tracking of click thr racking of click through rat ough rates, im es, impressions pressions • Suppor Supports real-time, aggregat ts real-time, aggregated data/statistics ed data/statistics • Cost-ef Cost-effectiv ective t e to operat o operate e

  14. Historical Note Historical Note • Legacy “ne Legacy “netw twor ork updat k update” e” (homepage circa 2007) (homepage circa 200 7) feature w eature was a mix as a mixed bag of ed bag of de detached ser tached services. vices. • Neither consist Neither consistent nor scalable ent nor scalable • Tightly coupled t Tightly coupled to our Inbo o our Inbox x • Migration plan Migration plan • Intr Introduce API, unify all oduce API, unify all disparat disparate ser e service calls vice calls • Add e dd event-driv ent-driven activity en activity tracking with DB back tracking with DB backend end • Build out the pr Build out the product oduct • Optimize! Optimize!

  15. Network Updates Service – Overview Network Updates Service – Overview

  16. Service API – Data Model Service API – Data Model <updates> <NCON> <connection> <id>2</id> <firstName>Chris</firstName> <lastName>Yee</lastName> </connection> </NCON> </updates>

  17. Service API – Post Service API – Post NetworkUpdatesNotificationService service = NetworkUpdatesNotificationService service = getNetworkUpdatesNotificationService(); getNetworkUpdatesNotificationService(); ProfileUpdateInfo profileUpdate = createProfileUpdate(); ProfileUpdateInfo profileUpdate = createProfileUpdate(); Set<NetworkUpdateDestination> destinations = Set<NetworkUpdateDestination> destinations = Sets.newHashSet( Sets.newHashSet( NetworkUpdateDestinations.newMemberFeedDestination(1213) NetworkUpdateDestinations.newMemberFeedDestination(1213) ); ); NetworkUpdateSource source = NetworkUpdateSource source = new NetworkUpdateMemberSource(1214); new NetworkUpdateMemberSource(1214); Date updateDate = getClock().currentDate(); Date updateDate = getClock().currentDate(); service.submitNetworkUpdate(source, service.submitNetworkUpdate(source, destinations, destinations, updateDate, updateDate, profileUpdate); profileUpdate);

  18. Service API – Retrieve Service API – Retrieve NetworkUpdatesService service = getNetworkUpdatesService(); NetworkUpdatesService service = getNetworkUpdatesService(); NetworkUpdateChannel channel = NetworkUpdateChannel channel = NetworkUpdateChannels.newMemberChannel(1213); NetworkUpdateChannels.newMemberChannel(1213); UpdateQueryCriteria query = UpdateQueryCriteria query = createDefaultQuery(). createDefaultQuery(). setRequestedTypes(NetworkUpdateType.PROFILE_UPDATE). setRequestedTypes(NetworkUpdateType.PROFILE_UPDATE). setMaxNumberOfUpdates(5). setMaxNumberOfUpdates(5). setCutoffDate(ClockUtils.add(currentDate, -7)); setCutoffDate(ClockUtils.add(currentDate, -7)); NetworkUpdateContext context = NetworkUpdateContext context = NetworkUpdateContextImpl.createWebappContext(); NetworkUpdateContextImpl.createWebappContext(); NetworkUpdatesSummaryResult result = NetworkUpdatesSummaryResult result = service.getNetworkUpdatesSummary(channel, service.getNetworkUpdatesSummary(channel, query, query, context); context);

  19. System at a glance System at a glance

  20. Data Collection – Challenges Data Collection – Challenges • How do we efficiently support collection in a dense social network • Requirement to retrieve the feed fast • But – there a lot of events from a lot of members and sources • And – there are multiplier effects

  21. Option 1: Push Architecture (Inbox) Option 1: Push Architecture (Inbox) • Each member has an inbox of notifications received from their connections/followees • N writes per update (where N may be very large) • Very fast to read • Difficult to scale, but useful for private or targeted notifications to individual users

  22. Option 1: Push Architecture (Inbox) Option 1: Push Architecture (Inbox)

  23. Option 2: Pull Architecture Option 2: Pull Architecture • Each member has an “Activity Space” that contains their actions on LinkedIn • 1 write per update (no broadcast) • Requires up to N reads to collect N streams • Can we optimize to minimize the number of reads? - Not all N members have updates to satisfy the query - Not all updates can/need to be displayed on the screen - Some members are more important than others - Some updates are more important than others - Recent updates generally are more important than older ones

  24. Pull Architecture – Writing Updates Pull Architecture – Writing Updates

  25. Pull Architecture – Reading Updates Pull Architecture – Reading Updates

  26. Storage Model Storage Model • L1: Temporal • Oracle • Combined CLOB / varchar storage • Optimistic locking • 1 read to update, 1 write (merge) to update • Size bound by # number of updates and retention policy • L2: Tenured • Accessed less frequently • Simple key-value storage is sufficient (each update has a unique ID) • Oracle/Voldemort

  27. Member Filtering Member Filtering • Need to avoid fetching N feeds (too expensive) • Filter contains an in-memory summary of user activity • Needs to be concise but representative • Partitioned by member across a number of machines • Filter only returns false-positives, never false-negatives • Easy to measure heuristic; for the N members that I selected, how many of those members actually had good content • Tradeoff between size of summary and filtering power

  28. Member Filtering Member Filtering

  29. Commenting Commenting • Users can create discussions around updates • Discussion lives in our forum service • Denormalize a discussion summary onto the tenured update, resolve first/last comments on retrieval • Full discussion can be retrieved dynamically

  30. Twitter Sync Twitter Sync • Partnership with Twitter • Bi-directional flow of status updates • Export status updates, import tweets • Users register their twitter account • Authorize via OAuth

  31. Twitter Sync – Overview Twitter Sync – Overview

  32. Email Delivery Email Delivery • Multiple concurrent email generating tasks • Each task has non-overlapping ID range generators to avoid overlap and allow parallelization • Controlled by task scheduler • Sets delivery time • Controls task execution status, suspend/resume, etc • Caches common content so it is not re-requested • Tasks deliver content to Notifier , which packages the content into an email via JSP engine • Email is then delivered to SMTP relays

  33. Email Delivery Email Delivery

  34. Email Delivery Email Delivery

  35. What else? What else? Brute force methods for scaling: • Shard databases • Memcache everything • Parallelize everything • User-initiated write operations are asynchronous when possible

  36. Know your numbers Know your numbers • Bottlenecks are often not where you think they are • Profile often • Measure actual performance regularly • Monitor your systems • Pay attention to response time vs transaction rate • Expect failures

  37. Measuring Performance Measuring Performance

Recommend


More recommend