CS 754 Advanced Distributed Systems Overview
Intro Samer Al‐Kiswany • PhD, UBC, 2013 • Postdoc, U. Wisconsin ‐ Madison • 5 internships at: Microsoft research labs, IBM Research, NEC Labs, Argonne National Labs Research Interests Storage and file systems, operating systems, distributed systems, cloud computing, data processing engines, high performance computing.
Computation Landscape • Provide an easy to Distributed use abstractions Software Systems • Hide complexities • Handle failures • Efficiently use resources • Leverage emerging technologies
Course Overview How to build systems that are: • High‐performance • Scalable • Reliable • Secure • Easy to manage • Useful Reality: Very hard and complex task. But: What is hard about it?
Communication • UDP • TCP • Messaging or pub/sub systems • Remote procedure calls (RPC), remote method invocation (RMI).
Fault Tolerance Failure model : partial failure. Goal : continue running correctly (maybe slower) Fault tolerance main questions: Where to recover? End‐to‐end principle: keep network core simple/fast, application features reside in the end nodes, not in network. e.g., reliability, security Which state to recover to? Depends on application. e.g., bank, Facebook When to recover? Eager, lazy, when needed?
Concurrency Modern systems are fundamentally concurrent. Goal : Utilize multiple levels of concurrency: data center, cluster, node, multi CPUs, CPU cores, and accelerators, to build faster systems. Challenge : correctness (consistency).
Topics • Distributed middleware • Fault tolerance • Consensus • Storage systems • Scalability • Scheduling • Security • Data processing engines • Case studies of production systems
Course Format Lecture‐based 3 Mini projects Assignments Lectures are a mix of: • Core algorithms and techniques • Case studies from core systems at Google, Facebook, and Amazon.
Course Focus and Objectives How to build high-performance, scalable, reliable, secure, easy to manage, and useful systems. Objectives Gain deep theoretical background Gain hands on experience Gain research experience Start your career in systems research with confidence, or gain skills that are in high-demand in the industry.
https://cs.uwaterloo.ca/~alkiswan/Classes/CS754‐20F
Recommend
More recommend