CSE 232A Graduate Database Systems Arun Kumar About Paper Reviews - PowerPoint PPT Presentation

CSE 232A   Graduate Database Systems Arun Kumar About Paper Reviews 1

Goal of Peer Review in Research Gatekeeping for quality of publication venue ❖ Collation of technical knowledge in the field ❖ Identify/support emerging research problems/areas ❖ Recognize/reward technical novelty, creativity, depth ❖ Provide constructively critical feedback to authors ❖ Appreciate strong efforts of authors ❖ 2

Goal of Paper Reviews in 232A Teach how to read cutting-edge research papers with a ❖ “critical thinking” mindset Teach how to appreciate/evaluate emerging ideas in ❖ an objective, honest, and balanced manner Make you take the paper readings seriously! :) ❖ Perhaps try to identify research gaps and extensions? ❖ 3

Sample paper to review from my 291 ACM SIGMOD 2012 Project Bismarck (Topic: DB for ML) 4

My 3-line summary (Setting) Integration of ML procedures with RDBMSs ❖ becoming popular for large-scale analytics over RDBMS-resident data without moving/copying data (Problem) Redesigning every individual ML procedure ❖ for in-RDBMS execution from scratch is a long, tedious, and wasteful development process (Approach) This paper proposes a unified abstraction ❖ and software architecture for a large class of ML procedures based on incremental gradient descent (IGD) that is implementable using existing common RDBMS abstraction of user-defined aggregate (UDA) 5

Sample good summary from S1 Each RDBMS has its own tools for ML problems. Usually they ❖ have different tools for different ML algorithms, which makes them difficult to maintain and cause tons of development overhead. Since most of the ML techniques can be represented as ❖ algorithms solving convex programming problem, i.e. minimizing some convex cost function, it is possible to use one single architecture to unify all of them. The authors proposed a unified architecture based on IGD and ❖ UDA allowing developer to adapt it to different ML problems with little development overhead. The authors also proposed a modified reservoir sampling ❖ technique called MRS. What is more, the authors studied the influence of data ordering and parallelized BISMARCK. 6

Sample good strong points from S1 Generality. A single framework solves multiple ❖ problems, making maintaining and development easier. The reuse of codes are drastically improved. One optimization for BISMARCK means optimizations for ALL. UDA-based. It is very easy to re-implement for ❖ different RDBMSes. Efficiency. It is only a little slower than an common ❖ aggregation and it out-performs many of the built-in tools provided by RDBMSes. 7

Sample good weak points from S1 Generality means loss of speciality. Using IGD for all ❖ convex problems may cause a consequence that some of the ML techniques can be more efficiently solved by some other specific techniques. This is the tradeoff. The limitations of IGD are also limitations of ❖ BISMARCK: internally sequential, hard to tune. Problems cannot be solved by IGD cannot use BISMARCK. RDBMSes have more to offer. BISMARCK only utilizes ❖ the UDA of databases. There is still space for optimization. Especially for distributed databases. 8

Other good strong points from class New optimization strategies can be tested using Bismarck ❖ without having to make changes for all analytic techniques. The paper honestly studies its overhead as well as thoroughly ❖ compares the result of integrations in three different RDBMS. The experiments are compelling. Their use of wall-clock-time ❖ measurements, and benchmarks against native UDA speeds, presents a strong case. The organization of the paper is very helpful such that reader ❖ who has little knowledge in this area can read and grasp the main concepts. The authors take real-world examples where necessary to explain the concepts which is helpful. 9

Other good weak points from class The theoretical justification of why IGD is essentially ❖ commutative and algebraic lacked depth. The claim that averaging models trained on different segments of the data would lead to convergence seemed dubious. Another limitation is that this architecture is designed for single ❖ node RDBMS. Currently, more applications move to cloud services and use distributed database or frameworks such as Hadoop and Spark. The assumption that the data is static might not be upheld in a ❖ production environment and Bismarck has no provision to support online learning. Strong assumption that the state (model parameters) fit in RAM. ❖ 10

CSE 232A Graduate Database Systems Arun Kumar About Paper Reviews - PowerPoint PPT Presentation

CSE 232A Graduate Database Systems Arun Kumar About Paper Reviews 1 Goal of Peer Review in Research Gatekeeping for quality of publication venue Collation of technical knowledge in the field Identify/support emerging research

CSE 232A Graduate Database Systems Fall 2019 Arun Kumar 1 About Myself 2009: Bachelors in

CSE 232A Graduate Database Systems Arun Kumar Topic 1: Data Storage Chapters 8 and 9 of Cow

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

CSE 232A Graduate Database Systems Arun Kumar Topic 2: Indexing and Sorting Chapters 10,

CSE 232A Graduate Database Systems Arun Kumar Review Discussion 1 Review Question Which

CSE 232A Graduate Database Systems Arun Kumar Topic 5: Data Integration and Cleaning Slide

CSE 232A Database System Implementation Arun Kumar Topic 8: Data Systems for ML Workloads

CSE 232A Database System Implementation Arun Kumar Topic 9: ML for RDBMSs Optional; this

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Post Graduate Fellowships Types of Fellowships Graduate Study Post Graduate Travel Post

Welcome to CSE 506 Introduc/on & Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506:

CSE 132B CSE 132B Database Systems Applications Database Systems Applications SQL as Query

The NSF Graduate The NSF Graduate The NSF Graduate The NSF Graduate Research Fellowship

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

Universal Plug and Play (UPnP) Internet Gateway Device (IGD)- Port Control Protocol (PCP)

THE E ECONOMY A AFTER CO CORONA RECOVERY O OR IN INTENSIV IVE C CARE? Professor Sir

Balanced Energy Consumption and Delay in the IoT-Fog-Cloud Computing Adila Mebrek Leila,

UPDAT E S June 12 Dr . Ha mmond, UA, E c onomic a nd Busine ss Re se a r c h Ce

On Using Application-Layer Middlebox Protocols for Peeking Behind NAT Gateways Teemu Rytilahti,

Smooth Constraint Convex Minimization via Conditional Gradients Sebastian Pokutta H. Milton

DATA ANALYTICS USING DEEP LEARNING GT 8803 FALL 2018 POOJA BHANDARY T O W A R D S A U N I F

Financial Disclosure IN INTERNET ADDICTION ACROSS THE LIFESPAN David R. Rosenberg, M.D., Paul

CSE 232A Graduate Database Systems Arun Kumar About Paper Reviews - PowerPoint PPT Presentation

CSE 232A Graduate Database Systems Arun Kumar About Paper Reviews 1 Goal of Peer Review in Research Gatekeeping for quality of publication venue Collation of technical knowledge in the field Identify/support emerging research

CSE 232A Graduate Database Systems Fall 2019 Arun Kumar 1 About Myself 2009: Bachelors in

CSE 232A Graduate Database Systems Arun Kumar Topic 1: Data Storage Chapters 8 and 9 of Cow

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

CSE 232A Graduate Database Systems Arun Kumar Topic 2: Indexing and Sorting Chapters 10,

CSE 232A Graduate Database Systems Arun Kumar Review Discussion 1 Review Question Which

CSE 232A Graduate Database Systems Arun Kumar Topic 5: Data Integration and Cleaning Slide

CSE 232A Database System Implementation Arun Kumar Topic 8: Data Systems for ML Workloads

CSE 232A Database System Implementation Arun Kumar Topic 9: ML for RDBMSs Optional; this

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Post Graduate Fellowships Types of Fellowships Graduate Study Post Graduate Travel Post

Welcome to CSE 506 Introduc/on &amp; Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506:

CSE 132B CSE 132B Database Systems Applications Database Systems Applications SQL as Query

The NSF Graduate The NSF Graduate The NSF Graduate The NSF Graduate Research Fellowship

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

Universal Plug and Play (UPnP) Internet Gateway Device (IGD)- Port Control Protocol (PCP)

THE E ECONOMY A AFTER CO CORONA RECOVERY O OR IN INTENSIV IVE C CARE? Professor Sir

Balanced Energy Consumption and Delay in the IoT-Fog-Cloud Computing Adila Mebrek Leila,

UPDAT E S June 12 Dr . Ha mmond, UA, E c onomic a nd Busine ss Re se a r c h Ce

On Using Application-Layer Middlebox Protocols for Peeking Behind NAT Gateways Teemu Rytilahti,

Smooth Constraint Convex Minimization via Conditional Gradients Sebastian Pokutta H. Milton

DATA ANALYTICS USING DEEP LEARNING GT 8803 FALL 2018 POOJA BHANDARY T O W A R D S A U N I F

Financial Disclosure IN INTERNET ADDICTION ACROSS THE LIFESPAN David R. Rosenberg, M.D., Paul

Welcome to CSE 506 Introduc/on & Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506: