an internet wide distributed system for data stream
play

An Internet-wide Distributed System for Data-stream Processing - PowerPoint PPT Presentation

An Internet-wide Distributed System for Data-stream Processing Gabriel Parmer, Richard West, Xin Qi, Gerald Fry, and Yuting Zhang Boston University Boston, MA gabep1@cs.bu.edu Computer Science Introduction Computer Science Internet


  1. An Internet-wide Distributed System for Data-stream Processing Gabriel Parmer, Richard West, Xin Qi, Gerald Fry, and Yuting Zhang Boston University Boston, MA gabep1@cs.bu.edu Computer Science

  2. Introduction Computer Science � Internet growth has stimulated development of data- rather than CPU-intensive applications � e.g., streaming media delivery, interactive distance learning, webcasting (e.g., SHOUTcast) � Peer-to-peer (P2P) systems now popular � Can efficiently locate data, but not used to deliver it � To date, limited work on scalable delivery & processing of data streams � Especially when these streams have QoS constraints! � Aim: Build an Internet-wide distributed system for delivery & processing of data streams considering QoS throughout � Implement logical network of end-systems � Support multiple channels connecting publishers to 1000s of subscribers with individual QoS constraints

  3. A Data-stream Processing Network Computer Science Static Subscribers Wireless Access point Intermediate nodes Overlay network Video sensors Mobile (publishers) Subscriber

  4. Properties of k-ary n-cubes Computer Science Physical view � M = k n nodes in the graph A B � If k = 2, degree of each node is n 1 2 C D � If k > 2, degree of each node is 2n 9 4 5 R1 R2 8 6 � Worst-case hop count between E F 10 3 nodes: � n � k/2 � G H Logical view � Average case path length: [011] [111] � A(k,n) = n � (k 2 /4) � 1/k 10 B C � Optimal dimensionality: 8 18 16 7 14 � n = ln M A D [010] 21 F G [101] � Minimizes A(k,n) for given k and n 10 12 19 18 E H 16 [000] [100]

  5. QoS considerations in k-ary n- cubes Computer Science � Methods for considering QoS � Routing algorithms � Ordered Dimensional Routing (ODR) � Random Ordering of Dimensions (Random) � Proximity-based Greedy Routing (Greedy) � Dynamic node re-assignment � Subscribers can exchange their logical identifier with nodes that are closer to the publisher of their data- stream � Less hops from publishers to subscribers on average

  6. Optimizations via routing Computer Science 100 2x16 ODR Cumulative % of Subscribers 90 2x16 Random 2x16 Greedy 80 16x4 ODR 16x4 Random 16x4 Greedy 70 60 Greedy routing 50 up to 40% better 40 30 20 10 0 1 2 4 8 16 32 64 128 256 512 Delay Penalty (relative to unicast)

  7. End-system Architecture Computer Science Publisher Intermediate Subscriber User App process App process Level Sandbox Region •Overlay management •Overlay management SPAs •Resource monitoring (e.g., routing agents) •Resource monitoring Kernel Level Control / Data Channels � Modify COTS systems to support efficient and predictable methods for execution of data-stream processing agents (SPAs). � Must consider QoS throughout, not only on the network level � User-level sandboxing for efficient SPAs: � Provide efficient method for isolating and executing extensions � Provide efficient method for passing data between user-level and network interface (eg. by using DMA)

  8. User-level Sandbox Implementation Computer Science � Modify address spaces of all processes to contain one or more shared pages of virtual addresses � Normally inaccessible at user-level � Kernel upcalls to execute sandbox extensions � This action also flips the protection bits so sandboxed extensions always execute at user-level, thus protecting the kernel � Can avoid address- P1 P2 Pn . space context switching Process- Mapped data private User . address space Level costs when executing . extensions because Sandbox region SPA for P2 SPA for Pn (shared virtual address space) they exist in all address Kernel Level spaces Kernel events make sandbox region user-level accessible

  9. SPA predictable execution support Computer Science � User-level networking stack in sandbox � Interacts with the NIC via DMA � Can execute and process at interrupt-time because sandbox is resident in every address space � Elimination of extra copies allows for greater efficiency � Interrupt-time execution allows isolation and predictability

  10. Conclusions Computer Science � Use ideas from overlay routing and user-level sandboxing to implement an Internet-wide distributed system � Provide efficient support for app-specific services and scalable data delivery � QoS is important throughout the entire system and should be considered on the network as well as end-host level

Recommend


More recommend