hbase on top of hdfs
play

HBase on top of HDFS Seminar Software Systems Engineering - PowerPoint PPT Presentation

HBase on top of HDFS Seminar Software Systems Engineering "Mobile, Security, Cloud Computing" Kevin Bckler Institut fr Telematik February 19, 2015 Kevin Bckler February 19, 2015 1 Outline 1. Introduction 2. Distributed


  1. HBase on top of HDFS Seminar Software Systems Engineering "Mobile, Security, Cloud Computing" Kevin Böckler Institut für Telematik February 19, 2015 Kevin Böckler February 19, 2015 1

  2. Outline 1. Introduction 2. Distributed File Systems 3. HBase 4. Application Kevin Böckler February 19, 2015 2

  3. Storage in Cloud Computing Requirements ◮ Millions of users ◮ Realtime access ◮ Reduce loss of data Use cases ◮ Cloud Storage ◮ Collaborative Platforms ◮ Social Platforms ◮ Messengers Kevin Böckler February 19, 2015 3

  4. Hadoop implementation HBase = Hadoop implementation of a database HDFS = Hadoop Distributed File System Kevin Böckler February 19, 2015 4

  5. Hadoop implementation HBase = Hadoop implementation of a database HDFS = Hadoop Distributed File System Kevin Böckler February 19, 2015 4

  6. 1. Introduction 2. Distributed File Systems 3. HBase 4. Application Kevin Böckler February 19, 2015 5

  7. HDFS Figure: Participants in a HDFS Cluster Kevin Böckler February 19, 2015 6

  8. Properties of HDFS Scalability ◮ Multiple Nodes distributed ◮ NameNode for Metadata, DataNode for actual payload ◮ ”Moving Computation is Cheaper than Moving Data” Transparency ◮ UNIX paths ( /files/seminar/hbase.pdf ) Location Transparency Location Independency ◮ Hidden replication ◮ Hidden fail-over Kevin Böckler February 19, 2015 7

  9. Fault-Tolerance in HDFS Replication at write process: Pipelining → Robustness Availability ◮ Heartbeat (NameNode ↔ DataNode) ◮ NameNode issues Re-replications Kevin Böckler February 19, 2015 8

  10. Fileaccess in HDFS File Access Write-once-read-many (WORM): Immutable Remote Access 1. Ask the NameNode for filename → DataNode-Connection 2. Open connection to DataNode 3. Transfer payload Kevin Böckler February 19, 2015 9

  11. 1. Introduction 2. Distributed File Systems 3. HBase 4. Application Kevin Böckler February 19, 2015 10

  12. Features of HBase Efficiency ◮ Bulk loading ◮ Sequential and Random reads ◮ Distributed MapReduce-Tasks Scalability ◮ Column Families ◮ Concurrency Model: File Locks Fault Tolerance ◮ inherited by HDFS ◮ Additionally Heartbeat (HBaseMaster ↔ HRegionServer) Kevin Böckler February 19, 2015 11

  13. HBase: HRegionServer Three Abstract Components HRegionServer ↔ HRegion ↔ Store ◮ gets connection from HClient ◮ receives Table-Requests (GET, PUT, DELETE, ...) from HClient ◮ manages HRegions Kevin Böckler February 19, 2015 12

  14. HBase: HRegion Three Abstract Components HRegionServer ↔ HRegion ↔ Store ID a b c d e 1 42 1 world 9 hello 2 43 3 npe 9 hadoop 3 19 3 java 9 ping 4 22 7 easy 9 bye ◮ HRegion ⊆ Table ◮ receives Requests from HRegionServer ◮ manages Stores ◮ Write-ahead-Log (WAL) of Column Writes ( → eventually flushed to Store) Kevin Böckler February 19, 2015 13

  15. HBase: Store Three Abstract Components HRegionServer ↔ HRegion ↔ Store ID a b c d e 1 42 1 world 9 hello 2 43 3 npe 9 hadoop 3 19 3 java 9 ping 4 22 7 easy 9 bye ◮ Store = ColumnFamily ◮ encapsulates one group of Columns and Rows ◮ holds its data in ◮ MemStore (working Cache) ◮ StoreFiles ( → HFile → HDFS) ◮ compacts StoreFiles Kevin Böckler February 19, 2015 14

  16. Architecture of HBase Kevin Böckler February 19, 2015 15

  17. 1. Introduction 2. Distributed File Systems 3. HBase 4. Application Kevin Böckler February 19, 2015 16

  18. API Java Usage ◮ Configuration Configuration config = HBaseConfiguration . create ( ) ; config . set ( " hbase . zookeeper . quorum " , " 127.0.0.1 " ) ; config . set ( " hbase . zookeeper . property . c l i e n t P o r t " , " 2180 " ) ; ◮ HTable HTable table = new HTable ( config , "someTableName" ) ; ◮ GET, SCAN, PUT, DELETE Get get = new Get ( Bytes . toBytes ( "someRowId" ) ) ; Result r e s u l t = table . get ( get ) ; ◮ Filter SingleColumnValueFilter f i l t e r = new SingleColumnValueFilter ( someColumnFamily , someColumn , CompareOp .EQUAL, Bytes . toBytes ( "someColumnNameValue" ) ) ; Kevin Böckler February 19, 2015 17

  19. Demo 1. Using the HBase Shell 2. HDFS - Filesystem and Influence of Compactions 3. Using Java-Implementation Figure: Process stack of the single-machine-cluster Kevin Böckler February 19, 2015 18

  20. Questions? Outline Storage in Cloud Computing Hadoop implementation Hadoop implementation HDFS Properties of HDFS Fault-Tolerance in HDFS Fileaccess in HDFS Features of HBase HBase: HRegionServer HBase: HRegion HBase: Store Architecture of HBase API Demo Questions? Kevin Böckler February 19, 2015 19

  21. HBase on top of HDFS Seminar Software Systems Engineering "Mobile, Security, Cloud Computing" Kevin Böckler Institut für Telematik February 19, 2015 Kevin Böckler February 19, 2015 20

Recommend


More recommend