HBase on top of HDFS Seminar Software Systems Engineering "Mobile, Security, Cloud Computing" Kevin Böckler Institut für Telematik February 19, 2015 Kevin Böckler February 19, 2015 1
Outline 1. Introduction 2. Distributed File Systems 3. HBase 4. Application Kevin Böckler February 19, 2015 2
Storage in Cloud Computing Requirements ◮ Millions of users ◮ Realtime access ◮ Reduce loss of data Use cases ◮ Cloud Storage ◮ Collaborative Platforms ◮ Social Platforms ◮ Messengers Kevin Böckler February 19, 2015 3
Hadoop implementation HBase = Hadoop implementation of a database HDFS = Hadoop Distributed File System Kevin Böckler February 19, 2015 4
Hadoop implementation HBase = Hadoop implementation of a database HDFS = Hadoop Distributed File System Kevin Böckler February 19, 2015 4
1. Introduction 2. Distributed File Systems 3. HBase 4. Application Kevin Böckler February 19, 2015 5
HDFS Figure: Participants in a HDFS Cluster Kevin Böckler February 19, 2015 6
Properties of HDFS Scalability ◮ Multiple Nodes distributed ◮ NameNode for Metadata, DataNode for actual payload ◮ ”Moving Computation is Cheaper than Moving Data” Transparency ◮ UNIX paths ( /files/seminar/hbase.pdf ) Location Transparency Location Independency ◮ Hidden replication ◮ Hidden fail-over Kevin Böckler February 19, 2015 7
Fault-Tolerance in HDFS Replication at write process: Pipelining → Robustness Availability ◮ Heartbeat (NameNode ↔ DataNode) ◮ NameNode issues Re-replications Kevin Böckler February 19, 2015 8
Fileaccess in HDFS File Access Write-once-read-many (WORM): Immutable Remote Access 1. Ask the NameNode for filename → DataNode-Connection 2. Open connection to DataNode 3. Transfer payload Kevin Böckler February 19, 2015 9
1. Introduction 2. Distributed File Systems 3. HBase 4. Application Kevin Böckler February 19, 2015 10
Features of HBase Efficiency ◮ Bulk loading ◮ Sequential and Random reads ◮ Distributed MapReduce-Tasks Scalability ◮ Column Families ◮ Concurrency Model: File Locks Fault Tolerance ◮ inherited by HDFS ◮ Additionally Heartbeat (HBaseMaster ↔ HRegionServer) Kevin Böckler February 19, 2015 11
HBase: HRegionServer Three Abstract Components HRegionServer ↔ HRegion ↔ Store ◮ gets connection from HClient ◮ receives Table-Requests (GET, PUT, DELETE, ...) from HClient ◮ manages HRegions Kevin Böckler February 19, 2015 12
HBase: HRegion Three Abstract Components HRegionServer ↔ HRegion ↔ Store ID a b c d e 1 42 1 world 9 hello 2 43 3 npe 9 hadoop 3 19 3 java 9 ping 4 22 7 easy 9 bye ◮ HRegion ⊆ Table ◮ receives Requests from HRegionServer ◮ manages Stores ◮ Write-ahead-Log (WAL) of Column Writes ( → eventually flushed to Store) Kevin Böckler February 19, 2015 13
HBase: Store Three Abstract Components HRegionServer ↔ HRegion ↔ Store ID a b c d e 1 42 1 world 9 hello 2 43 3 npe 9 hadoop 3 19 3 java 9 ping 4 22 7 easy 9 bye ◮ Store = ColumnFamily ◮ encapsulates one group of Columns and Rows ◮ holds its data in ◮ MemStore (working Cache) ◮ StoreFiles ( → HFile → HDFS) ◮ compacts StoreFiles Kevin Böckler February 19, 2015 14
Architecture of HBase Kevin Böckler February 19, 2015 15
1. Introduction 2. Distributed File Systems 3. HBase 4. Application Kevin Böckler February 19, 2015 16
API Java Usage ◮ Configuration Configuration config = HBaseConfiguration . create ( ) ; config . set ( " hbase . zookeeper . quorum " , " 127.0.0.1 " ) ; config . set ( " hbase . zookeeper . property . c l i e n t P o r t " , " 2180 " ) ; ◮ HTable HTable table = new HTable ( config , "someTableName" ) ; ◮ GET, SCAN, PUT, DELETE Get get = new Get ( Bytes . toBytes ( "someRowId" ) ) ; Result r e s u l t = table . get ( get ) ; ◮ Filter SingleColumnValueFilter f i l t e r = new SingleColumnValueFilter ( someColumnFamily , someColumn , CompareOp .EQUAL, Bytes . toBytes ( "someColumnNameValue" ) ) ; Kevin Böckler February 19, 2015 17
Demo 1. Using the HBase Shell 2. HDFS - Filesystem and Influence of Compactions 3. Using Java-Implementation Figure: Process stack of the single-machine-cluster Kevin Böckler February 19, 2015 18
Questions? Outline Storage in Cloud Computing Hadoop implementation Hadoop implementation HDFS Properties of HDFS Fault-Tolerance in HDFS Fileaccess in HDFS Features of HBase HBase: HRegionServer HBase: HRegion HBase: Store Architecture of HBase API Demo Questions? Kevin Böckler February 19, 2015 19
HBase on top of HDFS Seminar Software Systems Engineering "Mobile, Security, Cloud Computing" Kevin Böckler Institut für Telematik February 19, 2015 Kevin Böckler February 19, 2015 20
Recommend
More recommend