approche algorithmique des syst emes distribu es aasr
play

Approche Algorithmique des Syst` emes Distribu es (AASR) - PowerPoint PPT Presentation

Approche Algorithmique des Syst` emes Distribu es (AASR) Guillaume Pierre guillaume.pierre@irisa.fr Dapr` es un jeu de transparents de Maarten van Steen VU Amsterdam, Dept. Computer Science 02: Architectures Contents Chapter 01:


  1. Approche Algorithmique des Syst` emes Distribu´ es (AASR) Guillaume Pierre guillaume.pierre@irisa.fr D’apr` es un jeu de transparents de Maarten van Steen VU Amsterdam, Dept. Computer Science 02: Architectures

  2. Contents Chapter 01: Introduction 02: Architectures 03: Processes 04: Communication 05: Naming 06: Synchronization 07: Consistency & Replication 08: Fault Tolerance 09: Security 2 / 42

  3. Architectures Architectural styles Software architectures Architectures versus middleware Self-management in distributed systems 3 / 42

  4. Architectural styles Basic idea Organize into logically different components, and distribute those components over the various machines. Layer N Object Object Layer N-1 Object Method call Request� Response� flow flow Object Layer 2 Object Layer 1 (a) (b) (a) Layered style is used for client-server system (b) Object-based style for distributed object systems. 4 / 42

  5. Architectural Styles Observation Decoupling processes in space (“anonymous”) and also time (“asynchronous”) has led to alternative styles. Component Component Component Component Data Publish Subscribe Notification Subscribe delivery delivery Event bus Publish Component Shared (persistent) data space (a) (b) (a) Publish/subscribe [decoupled in space] (b) Shared dataspace [decoupled in space and time] 5 / 42

  6. Centralized Architectures Basic Client–Server Model Characteristics: There are processes offering services ( servers ) There are processes that use services ( clients ) Clients and servers can be on different machines Clients follow request/reply model wrt to using services Wait for result Client Request Reply Server Provide service Time 6 / 42

  7. Application Layering Traditional three-layered view User-interface layer contains units for an application’s user interface Processing layer contains the functions of an application, i.e. without specific data Data layer contains the data that a client wants to manipulate through the application components Observation This layering is found in many distributed information systems, using traditional database technology and accompanying applications. 7 / 42

  8. Application Layering User-interface User interface level HTML page containing list Keyword expression HTML generator Processing level Query Ranked list generator of page titles Ranking Database queries algorithm Web page titles with meta-information Data level Database with Web pages 8 / 42

  9. Multi-Tiered Architectures Single-tiered: dumb terminal/mainframe configuration Two-tiered: client/single server configuration Three-tiered: each layer on separate machine Traditional two-tiered configurations: Client machine User interface User interface User interface User interface User interface Application Application Application Database User interface Application Application Application Database Database Database Database Database Server machine (a) (b) (c) (d) (e) 9 / 42

  10. Decentralized Architectures Observation In the last couple of years we have been seeing a tremendous growth in peer-to-peer systems . Structured P2P : nodes are organized following a specific distributed data structure Unstructured P2P : nodes have randomly selected neighbors Hybrid P2P : some nodes are appointed special functions in a well-organized fashion Note In virtually all cases, we are dealing with overlay networks : data is routed over connections setup between the nodes (cf. application-level multicasting) 10 / 42

  11. Structured P2P Systems Basic idea Organize the nodes in a structured overlay network such as a logical ring, or a hypercube, and make specific nodes responsible for services based only on their ID. 0000 0001 1001 1000 0010 0011 1011 1010 0100 1101 0101 1100 0110 0111 1111 1110 Note The system provides an operation LOOKUP(key) that will efficiently route the lookup request to the associated node. 11 / 42

  12. Unstructured P2P Systems Essence Many unstructured P2P systems are organized as a random overlay: two nodes are linked with probability p . Observation We can no longer look up information deterministically, but will have to resort to searching: Flooding: node u sends a lookup query to all of its neighbors. A neighbor responds, or forwards (floods) the request. There are many variations: Limited flooding (maximal number of forwarding) Probabilistic flooding (flood only with a certain probability). Random walk: Randomly select a neighbor v . If v has the answer, it replies, otherwise v randomly selects one of its neighbors. Variation: parallel random walk. Works well with replicated data. 12 / 42

  13. Superpeers Observation Sometimes it helps to select a few nodes to do specific work: superpeer . Super peer Overlay network of super peers Weak peer Examples Peers maintaining an index (for search) Peers monitoring the state of the network Peers being able to setup connections 13 / 42

  14. Hybrid Architectures: Client-server combined with P2P Example Edge-server architectures, which are often used for Content Delivery Networks Client Content provider ISP ISP Core Internet Edge server Enterprise network 14 / 42

  15. Exercices In a structured overlay network, messages are routed according to the topology of the overlay. What is an important disadvantage of this approach? Not every node in a super-peer network should become a superpeer. What are reasonable requirements that a superpeer should meet? 15 / 42

  16. The problem with centralized architectures 16 / 42

  17. The problem with centralized architectures 17 / 42

  18. The problem with centralized architectures 18 / 42

  19. The problem with centralized architectures 19 / 42

  20. BitTorrent Designed for the transfer of large files to many clients Based on swarming : a server sends different parts of a file to different clients, and the clients exchange chunks with one another Terminology One session = distribution of a single (large) file Seeder = a node that has the whole file Leecher = a node still downloading the file Elements An ordinary web server Torrent file: A static meta-info file A tracker A seeder (an initial client with the complete file) On the end-user side: web browser + BitTorrent client 20 / 42

  21. The torrent file contains: Tracker address (IP + port) Bytes per chunk Number of chunks For each chunk: the SHA1 hash of its content Helps validate the correctness of downloaded chunks 21 / 42

  22. Joining a BitTorrent session 22 / 42

  23. Connection states On each side, a connection maintains two variables: Interested : you have a chunk that I want Allows a peer to know its possible clients for upload Chocked : I don’t want to send you data at the time Possible reasons: I have found faster peers, you did not/can t reciprocate enough, . . . 23 / 42

  24. Which missing chunk should we fetch first? Simple strategy: random selection Choose at random among chunks available in peer set Randomness ensures diversity Biased strategy: peers apply the rarest-first policy Choose the least represented missing chunk in the peer set Rare chunks can more easily be traded with others Maximize the minimum number of copies of any given chunk in each peer set BitTorrent uses rarest-first policy except for newcomers that use random to quickly obtain a first block 24 / 42

  25. Peer selection policy Serving too many peers simultaneously is not efficient BitTorrent serves a few (around 4 or 5) hosts in parallel Split availabler outgoing bandwidth equally between these connections Which hosts to serve? Seeders’ policy: The ones that offer the best download rates Leechers’ policy: The ones that also serve us: tit for tat Choke the rest peers Can there be any better hosts? Reconsider choking/unchoking every 10 sec (long enough for TCP to reach steady state) Optimistically unchoke a random peer every 30 sec to give a chance to another host to provide better service 25 / 42

  26. Exercises Consider a BitTorrent system in which each node has an outgoing bandidth capacity B out and an incoming bandwidth capacity B in . Some of these nodes (called seeds) voluntarily offer files to be downloaded by others. We assume that each peer can contact at most one seed at a time. What is the maximum download capacity that a BitTorrent peer can have? BitTorrent uses a policy similar to tit-for-tat. Give a technical argument why the strict application of this policy would be a bad idea. BitTorrent users may want to cheat the protocol: imagine a strategy that allows BitTorrent users to download content faster. Does this strategy harm the overall system? 26 / 42

  27. Architectures versus Middleware Problem In many cases, distributed systems/applications are developed according to a specific architectural style. The chosen style may not be optimal in all cases ⇒ need to (dynamically) adapt the behavior of the middleware. Interceptors Intercept the usual flow of control when invoking a remote object. 27 / 42

  28. Interceptors Client application Intercepted call B.do_something(value) Application stub Request-level interceptor Nonintercepted call invoke(B, &do_something, value) Object middleware Message-level interceptor send([B, "do_something", value]) Local OS To object B 28 / 42

Recommend


More recommend