data management systems
play

Data Management Systems Fall Semester 2020 Gustavo Alonso - PowerPoint PPT Presentation

Data Management Systems Fall Semester 2020 Gustavo Alonso Institute of Computing Platforms Administrative Introduction Department of Computer Science ETH Zrich Administrative Introduction 1 About us Prof. Gustavo Alonso Systems


  1. Data Management Systems Fall Semester 2020 Gustavo Alonso Institute of Computing Platforms Administrative Introduction Department of Computer Science ETH Zürich Administrative Introduction 1

  2. About us • Prof. Gustavo Alonso • Systems Group (systems.ethz.ch) • Data Management • 5 faculty • Distributed Systems • 12 senior researchers • Cloud and Data Center • 32 PhD students Architecture • Covers wide rage of system • Dario Korolija related topics, including data • Hardware Acceleration processing and data management • Dimitris Koutsoukos • Ample experience with database • Data processing & ML system design, including • Michal Wawrzoniak products and open source • Systems & Serverless Administrative Introduction 2

  3. About the course • Lectures (CAB G11) • Web page: • Wednesday 10:00 – 12:00 https://systems.ethz.ch/education/c • Fridays 08:00 – 09:00 ourses/2020-fall/data-management- • Exercises (Zoom) systems.html • Fridays 09:00 – 10:00 • Moodle: • Exam during examination season https://moodle- (January) app2.let.ethz.ch/course/view.php?id • Written (potentially Moodle) =13391 • Stream/recording https://video.ethz.ch/live/lectures.ht ml Administrative Introduction 3

  4. Course organization • Lectures will be in the classroom, streamed live, and recorded • Exercises sessions will be per Zoom and recorded • Occasional presentation/talk in HG E 1.2 (streamed and recorded) • Attendance to CAB G11: • Capacity is 95 seats • Enrollment > Capacity • Need to know who will be attending in person and who online • Please do not show up unannounced Administrative Introduction 4

  5. Course Materials • Unfortunately, there is no text book for what we are going to cover • Part of it can be found in standard database textbooks (next slide) • Part of it can be found online in manuals and product guides • Part of it is in articles and specialized books • Reading material and references will be provided in the slides and in the web pages • Reading assignments (plus slides and what is covered in the lecture) constitute the basis for the material that can be asked at the exam Administrative Introduction 5

  6. References (database basics) • Check Bachelor level course (Data Management and Databases) as a general reference and basic material: • http://www.ds3lab.com/dmdb-2020/ • Database Systems, The Complete Book; Garcia-Molina, Ullman, Widom; Prentice Hall • Database Systems Concepts; Silberschatz, Korth, Sudarshan; McGraw- Hill Administrative Introduction 6

  7. Exercises/Homework • Due to the safety measures imposed and the possibility that access to ETH could be restricted at short notice, this semester we will not have a project or practical component in the course • Will focus instead on the design and on the research literature • More emphasis on understanding the architectures and how everything fits together. More algorithms • To be fair and to avoid misunderstandings: course will not necessarily be easier => reading and designing instead of programming • We will be regularly publishing exercises and homework for you to test your knowledge and also practice for the exam. Administrative Introduction 7

  8. Reading material • We will provide pointers to material that is either publicly accessible or accessible through the ETH library (you will need to access it from ETH’s network) • Please read the material as we go, do not leave everything for the end and try to read it before the exam, it will not work • We are studying systems, there are many dependencies and components that depend on other components and concepts. If you do not stay up-to-date with the course, in a couple of weeks it will become difficult to understand what we are talking about • There will be many references in the lectures to previous lectures Administrative Introduction 8

  9. Motivation for the course • Data has become a precious commodity • Data Management has become a crucial component in IT • Data Management concepts and how to deal with large data collections is an efficient and effective manner is fundamental knowledge any computer scientist should have • The course will provide a broad perspective on data management systems, from traditional relational database engines to modern cloud data processing architectures Administrative Introduction 9

  10. A brief history of data management • Tabulation of data and indexing has existed for many centuries • The modern era of data management started in the late 60’s and early 70’s with the relational model Edgar F. Codd CACM, Volume 13, Number 6, June 1970 Administrative Introduction 10

  11. Before the relational model • Initially tapes, which only allowed sequential access (what today we would call a scan) • Hard disks enabled random access, leading to new models (network and hierarchical) • Hierarchical: entities and relationships are organized as a tree. It is very inflexible as it support only 1-to-many relationships • Network: entities and relationships are organized as a graph • Eventually replaced with the relational model in databases • Hierarchical still in use in specialized systems and some data representations (XML) Administrative Introduction 11

  12. After relational model • Race to implement the relational model as clearly superior • IBM • Oracle • Ingres (Postgres) • Many decades of effort optimizing, tuning, and extending relational database engines • In the 80’s, SQL becomes a standard, databases start using the available networking to become distributed and parallel • In the 90’s, data warehouses, analytics, large scale databases Administrative Introduction 12

  13. In this century • 2000’s: Internet era: stored procedures, cluster scale databases, start of MapReduce frameworks in the cloud, NoSQL systems to support Internet scale workloads • 2010’s: Back to basics: adding SQL and relational features to the systems of the 2000’s (Google’s Spanner, Spark, NoSQL), multi -core, main memory databases, OLAP+OLTP systems; Diversification: graph databases • 2020’s: Hardware acceleration, cloud native engines, very large scales, machine learning, … Administrative Introduction 13

  14. History in perspective • Software systems have a very long life: • Important to understand where they come from and what problem they were solving • Adapted to new technologies and hardware as time goes on • We will cover not only the mechanisms but also the motivation behind the designs, the initial problem they addressed, and how they have changed as technology evolves • In computer science and IT, ideas tend to be recycled every X years instead of being forgotten … Administrative Introduction 14

  15. What you need to know • Basic knowledge of databases • Data Modeling and Data Bases (bachelor course, D-INFK) • SQL • Basic knowledge of computer architecture and systems • Virtual memory • Basic programming • Data structures and algorithms • Systems programming Administrative Introduction 15

  16. Objectives of the course • Cover all key aspects of data management systems • Storage • Optimization • Architectures • Concurrency control • Algorithms and data structures • Modern approaches (data centers and cloud) • Provide a solid understanding of how systems work and the design decisions behind the architectures • Provide the vocabulary and system understanding to be able to engage in the design of data management systems Administrative Introduction 16

  17. What you should be able to do at the end • Be familiar with data management systems architecture • Understand the trade-offs in the designs and what works when • Understand the workloads and applications • Understand the architectural differences between server based and data center/cloud based systems • Read product manuals and research papers describing the architecture of database systems with a solid understanding of what is being done and why • Be able to put data management in context of large IT projects Administrative Introduction 17

Recommend


More recommend