The Distributed Database Based on Kudu Shunda Lin Outline - PowerPoint PPT Presentation

Jan 28, 2023 •150 likes •384 views

The Distributed Database Based on Kudu Shunda Lin Outline Motivation Introduction of Kudu Deployment and Configuration Query Test Conclusion Outline Motivation Introduction of Kudu Deployment and Configuration

The Distributed Database Based on Kudu Shunda Lin
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Motivation
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Traditional System • The application system needs to reverse the data between the real - time and offline systems, and write a complex code. • Systems are complex, need various backups, security policies, and monitoring systems • There is a delay in the transformation from real-time system to offline system for OLAP analysis • It requires expensive price to change or rewirte the backward data when data in the past has been filed
Kudu-Fast Analytics on Fast Data • Released by Cloudera in 2015 • Used for OLAP • High performance for both data scanning and random access • Simplifying complex hybrid architectures
Architectures and Design • Super-fast Columnar Storage
Architectures and Design • Distribution and Fault Tolerance
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Deployment • master ： slave2 （ 192.168.0.134 ） • tserver ： slave1 （ 192.168.0.135 ） slave2 (192.168.0.134) slave3 (192.168.0.100)
Data Persistence • MySQL->HDFS->Kudu • Sqoop a command-line interface application for transferring data between relational databases and Hadoop • Spark an open-source cluster-computing framework
MySQL to HDFS Sqoop import –connect jdbc:mysql://202.120.36.137:6033/mag-new-160205 –username=data – password=data –table AuthorFieldCount –m 1 –target-dir /user/hadoop/AuthorFieldCount –as-parquetfile
Data Persistence on Kudu • spark-shell • design table • create table • insert data
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Query Test • 从领域相关表中提取出 1000 个与某领域最为相关的领域之间的相关关系 select FOSID as Source, FOSReferencesCount.FOSReference as Target, Similarity/10000000 as Weight from (select FOSReference from `FOSReferencesCount` where `FOSID` = '0271BC14' order by `Similarity` desc limit 1000) e1, (select FOSReference from `FOSReferencesCount` where `FOSID` = '0271BC14' order by `Similarity` desc limit 1000) e2, FOSReferencesCount where e1.`FOSReference` = `FOSReferencesCount`.FOSID and e2.`FOSReference` = `FOSReferencesCount`.FOSReference;
Computer Science Ethnic studies Data Structure FOSID (0271BC14) (03D2C4FF) (09ACCB7D) MySQL 82.4s 65.4s 55.7s Kudu 8.23s 9.175s 7.821s Query 90 80 70 60 50 40 30 20 10 0 Case1 Case2 Case3 MySQL Kudu
Query Test 180 160 140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 MySQL Kudu
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Q&A

Recommend

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases 1 19.1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Database systems that run on each site are independent of each other

761 views • 42 slides

CS4224/CS5424 Lecture 1 Introduction Distributed Database Systems A distributed database is a

CS4224/CS5424 Lecture 1 Introduction Distributed Database Systems A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network A distributed database management system (DDBMS)

1.3k views • 37 slides

15-721 ADVANCED DATABASE SYSTEMS Lecture #25 End of Semester + Impala/Kudu Tech Talk Andy

15-721 ADVANCED DATABASE SYSTEMS Lecture #25 End of Semester + Impala/Kudu Tech Talk Andy Pavlo / / Carnegie Mellon University / / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 2 TODAYS AGENDA

320 views • 9 slides

KUDU POWER PROJECT NAMIBIAS FLAG SHIP POWER GENERATION PROJECT 19 SEPTEMBER 2014

KUDU POWER PROJECT NAMIBIAS FLAG SHIP POWER GENERATION PROJECT 19 SEPTEMBER 2014 Background Kudu Gas Field was discovered in 1973 by Chevron, The first attempt to commercialize the Kudu gas resource for power generation was

243 views • 21 slides

Distributed Databases Distributed database management system A distributed database (DDB) is

Distributed Databases Distributed database management system A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system

444 views • 10 slides

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design Distributed Query Processing Distributed Transaction Management Building Distributed Database Systems (RAID) Mobile Database Systems Privacy, Trust,

620 views • 20 slides

Distributed Databases Chapter 16 1 What is a Distributed Database? Database whose relations

Distributed Databases Chapter 16 1 What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations are replicated at different sites Database whose relations are split between

551 views • 17 slides

CockroachDB Architecture of a Geo-Distributed SQL Database Nathan VanBenschoten (@natevanben),

CockroachDB Architecture of a Geo-Distributed SQL Database Nathan VanBenschoten (@natevanben), Staff Software Engineer CockroachDB: Geo-distributed SQL Database Make Data Easy Distributed Horizontally scalable to grow with your

2.25k views • 110 slides

Transaction Processing in Distributed Database Systems Dr Janusz R. Getta School of Computing

Transaction Processing in Distributed Database Systems file:///Users/jrg/235-2020-SPRING/SLIDES/WEEK07/16transactiondistributed/16transactiondistributed... CSCI235 Database Systems Transaction Processing in Distributed Database Systems Dr

998 views • 24 slides

CockroachDB Architecture of a Geo-Distributed SQL Database Peter Mattis (@petermattis),

CockroachDB Architecture of a Geo-Distributed SQL Database Peter Mattis (@petermattis), Co-founder & CTO CockroachDB: Geo-distributed SQL Database Make Data Easy Distributed Horizontally scalable to grow with your application

1.58k views • 108 slides

F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cm u.edu)

F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cm u.edu) 15-799 10/ 21/ 2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords system Combines

750 views • 37 slides

Distributed Relational Database Systems Dr Janusz R. Getta School of Computing and Information

Distributed Relational Database Systems file:///Users/jrg/235-2020-SPRING/SLIDES/WEEK07/15distributeddatbase/15distributeddatabase.html#1 CSCI235 Database Systems Distributed Relational Database Systems Dr Janusz R. Getta School of Computing

618 views • 31 slides

Computing based Raster GIS Systems Presented by Cao Kang, Ph.D. Geography Department, Clark

A Distributed Storage Schema for Cloud Computing based Raster GIS Systems Presented by Cao Kang, Ph.D. Geography Department, Clark University Cloud Computing and Distributed Database Management System Why distributed database in Cloud?

587 views • 19 slides

CS377: Database Systems Distributed Databases Distributed Databases

CS377: Database Systems Distributed Databases Distributed Databases Department of Mathematics and Computer Science Emory University 1 Centralized DBMS on a Network

669 views • 29 slides

Distributed Transaction Management Database Management Systems, 2 nd Edition. R. Ramakrishnan and

Distributed Transaction Management Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 1 Distributed Concurrency Control Use global 2PL Or, simply use local Strict 2PL at each site Database Management

295 views • 4 slides

OS Support for a Commodity Database on PC Clusters Distributed Devices vs. Distributed File

OS Support for a Commodity Database on PC Clusters Distributed Devices vs. Distributed File Systems Felix Rauch (National ICT Australia) Thomas M. Stricker (Google Inc., USA) Laboratory for Computer Systems, ETH Zurich, Switzerland NIC

818 views • 38 slides

Distributed Transaction Management Advanced Topics in Database Management (INFSCI 2711) Some

Distributed Transaction Management Advanced Topics in Database Management (INFSCI 2711) Some materials are from Database Management Systems, Ramakrishnan and Gehrke and Database System Concepts, Siberschatz, Korth and Sudarshan and Data

517 views • 24 slides

Spanner: Googles Globally-Distributed Database Wilson Hsieh representing a host of authors

Spanner: Googles Globally-Distributed Database Wilson Hsieh representing a host of authors OSDI 2012 What is Spanner? Distributed multiversion database General-purpose transactions (ACID) SQL query language Schematized tables

648 views • 27 slides

From Distributed Logs to Database Replication Dr. Samuel Benz How to achieve scalability, fault

From Distributed Logs to Database Replication Dr. Samuel Benz How to achieve scalability, fault tolerance and consistency in distributed systems? Distributed applications in theory. . . . . . in practice Introduction Reliable and Scalable

440 views • 22 slides

CS 61: Database Systems Distributed systems Adapted mongodb.com unless otherwise noted Agenda

CS 61: Database Systems Distributed systems Adapted mongodb.com unless otherwise noted Agenda 1. Centralized systems 2. Distributed systems High availability Scalability 3. MongoDB 2 A single database can handle many thousands of

671 views • 31 slides

CSCI403 Lecture 36: NoSQL, Distributed DBs, DBs in the Cloud So you want a database... Imagine

CSCI403 Lecture 36: NoSQL, Distributed DBs, DBs in the Cloud So you want a database... Imagine Relational Doesnt Exist MongoDB (from "humongous") is a scalable, high-performance, open source, document-oriented database.

701 views • 20 slides

Spanner : Google's Globally-Distributed Database James Sedgwick and Kayhan Dursun Spanner - A

Spanner : Google's Globally-Distributed Database James Sedgwick and Kayhan Dursun Spanner - A multi-version, globally-distributed, synchronously-replicated database - First system to - Distribute data globally - Externally-consistent

619 views • 29 slides

Spanner A distributed database system Presented by Yue Xia Background - Developed by Google

Spanner A distributed database system Presented by Yue Xia Background - Developed by Google initially as a key-value storage system - Developers want traditional database features like query language - Evolved to a full featured SQL

718 views • 48 slides

THE UNBUNDLED DATABASE Leveraging the unbundled database via distributed logs and stream

THE UNBUNDLED DATABASE Leveraging the unbundled database via distributed logs and stream processing Who Am I? Data Infrastructure at Pluralsight Software and data engineering at Software engineering at WDPRO Rackspace Hosting | 2

807 views • 60 slides