• 欢迎访问速搜资源吧,如果在网站上找不到你需要的资源,可以在留言板上留言,管理员会尽量满足你!

【速搜问答】HBase是什么

问答 admin 2个月前 (08-13) 68次浏览 已收录 0个评论

汉英对照:
Chinese-English Translation:

HBase是一个分布式的、面向列的开源数据库,在Hadoop之上提供了类似于Bigtable的能力,是Apache的Hadoop项目的子项目。HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库。 HBase 是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的 Google 论文“Bigtable:一个结构化数据的分布式存储系统”。就像 Bigtable 利用了 Google 文件系统(File System)所提供的分布式数据存储一样,HBase 在 Hadoop 之上提供了类似于 Bigtable 的能力。

HBase is a distributed, column oriented open source database, which provides BigTable like capabilities on Hadoop. It is a sub project of Apache’s Hadoop project. HBase is different from the general relational database. It is suitable for unstructured data storage. HBase is a distributed, column oriented open source database. This technology comes from the Google paper “BigTable: a distributed storage system for structured data” written by Fay Chang. Just as BigTable takes advantage of the distributed data storage provided by Google file system, HBase provides BigTable like capabilities on top of Hadoop.

HBase 是 Apache 的 Hadoop 项目的子项目。HBase 不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库。另一个不同的是 HBase 基于列的而不是基于行的模式。

HBase is a sub project of Apache’s Hadoop project. HBase is different from the general relational database. It is suitable for unstructured data storage. Another difference is that HBase is column based rather than row based.

结构介绍

Structure introduction

HBase – Hadoop Database,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用 HBase 技术可在廉价 PC Server 上搭建起大规模结构化存储集群。

HBase – Hadoop database is a high reliability, high performance, column oriented and scalable distributed storage system. Large scale structured storage cluster can be built on cheap PC server by using HBase technology.

与 FUJITSU Cliq 等商用大数据产品不同,HBase 是 Google Bigtable 的开源实现,类似 Google Bigtable 利用 GFS 作为其文件存储系统,HBase 利用 Hadoop HDFS 作为其文件存储系统;Google 运行 MapReduce 来处理 Bigtable 中的海量数据,HBase 同样利用 Hadoop MapReduce 来处理 HBase 中的海量数据;Google Bigtable 利用 Chubby 作为协同服务,HBase 利用 Zookeeper 作为对应。

Unlike Fujitsu Cliq and other commercial big data products, HBase is an open source implementation of Google BigTable, similar to Google BigTable using GFS as its file storage system, HBase using Hadoop HDFS as its file storage system; Google runs MapReduce to process massive data in BigTable, and HBase also uses Hadoop MapReduce to process HBase Google BigTable uses chubby as a collaborative service, HBase uses zookeeper as the corresponding.

上图描述 Hadoop EcoSystem 中的各层系统。其中,HBase 位于结构化存储层,Hadoop HDFS 为 HBase 提供了高可靠性的底层存储支持,Hadoop MapReduce 为 HBase 提供了高性能的计算能力,Zookeeper 为 HBase 提供了稳定服务和 failover 机制。

The above figure describes the systems of each layer in Hadoop ecosystem. Among them, HBase is located in the structured storage layer, Hadoop HDFS provides high reliable underlying storage support for HBase, Hadoop MapReduce provides high-performance computing power for HBase, and zookeeper provides stable service and failure mechanism for HBase.

此外,Pig 和 Hive 还为 HBase 提供了高层语言支持,使得在 HBase 上进行数据统计处理变的非常简单。 Sqoop 则为 HBase 提供了方便的 RDBMS 数据导入功能,使得传统数据库数据向 HBase 中迁移变的非常方便。

In addition, pig and hive also provide high-level language support for HBase, which makes the data statistical processing on HBase very simple. Sqoop provides a convenient RDBMS data import function for HBase, which makes the traditional database data transfer to HBase very convenient.

模型

Model

主要讨论逻辑模型和物理模型

This paper mainly discusses logical model and physical model

(1)逻辑模型

(1) Logical model

Hbase 的名字的来源是 Hadoop database,即 hadoop 数据库。

The name of HBase comes from Hadoop database, that is, Hadoop database.

主要是从用户角度来考虑,即如何使用 Hbase。

It is mainly considered from the perspective of users, that is, how to use HBase.

(2)物理模型

(2) Physical model

主要从实现 Hbase 的角度来讨论

This paper mainly discusses the implementation of HBase

访问接口

Access interface

1. Native Java API,最常规和高效的访问方式,适合 Hadoop MapReduce Job 并行批处理 HBase 表数据

1. Native Java API, the most conventional and efficient access method, is suitable for Hadoop MapReduce job parallel batch processing of HBase table data

2. HBase Shell,HBase 的命令行工具,最简单的接口,适合 HBase 管理使用

2. HBase shell, the command line tool of HBase, the simplest interface, suitable for HBase management

3. Thrift Gateway,利用 Thrift 序列化技术,支持 C++,PHP,Python 等多种语言,适合其他异构系统在线访问 HBase 表数据

3. Thrift gateway, using thrift serialization technology, supports C + +, PHP, Python and other languages, which is suitable for other heterogeneous systems to access HBase table data online

4. REST Gateway,支持 REST 风格的 Http API 访问 HBase, 解除了语言限制

4. Rest gateway, which supports rest style HTTP API to access HBase, removes language restrictions

5. Pig,可以使用 Pig Latin 流式编程语言来操作 HBase 中的数据,和 Hive 类似,本质最终也是编译成 MapReduce Job 来处理 HBase 表数据,适合做数据统计

5. Pig, you can use the Pig Latin stream programming language to operate the data in HBase. Similar to hive, the essence is finally compiled into MapReduce job to process HBase table data, which is suitable for data statistics

6. Hive,当前 Hive 的 Release 版本尚没有加入对 HBase 的支持,但在下一个版本 Hive 0.7.0 中将会支持 HBase,可以使用类似 SQL 语言来访问 HBase

6. Hive, the current release version of hive has not added support for HBase, but HBase will be supported in hive 0.7.0 in the next version. You can access HBase using SQL like language

HBase 数据模型 Table & Column Family

HBase data model table & amp; column family

Row Key

Row Key

Timestamp

Timestamp

Column Family

Column Family

URI

URI

Parser

Parser

r1

R1

t3

T3

url=http://

url=http://

title=

title=

t2

T2

host=com

host=com

t1

T1

r2

R2

t5

T5

url=http://

url=http://

content=每天…

Content = every day

t4

T4

host=com

host=com

Ø Row Key: 行键,Table 的主键,Table 中的记录默认按照 Row Key 升序排序

&Amp; Oslash; row key: row key, the primary key of table, and the records in table are sorted in ascending order of row key by default

Ø Timestamp:时间戳,每次数据操作对应的时间戳,可以看作是数据的 version number

&Amp; Oslash; timestamp: the timestamp corresponding to each data operation can be regarded as the version number of the data

Ø Column Family:列簇,Table 在水平方向有一个或者多个 Column Family 组成,一个 Column Family 中可以由任意多个 Column 组成,即 Column Family 支持动态扩展,无需预先定义 Column 的数量以及类型,所有 Column 均以二进制格式存储,用户需要自行进行类型转换。

&Amp; Oslash; column family: column cluster. A table is composed of one or more column families in the horizontal direction. A column family can be composed of any number of columns, that is, the column family supports dynamic expansion without defining the number and type of columns in advance. All columns are stored in binary format, and users need to type convert themselves.

Table & Region

Table & Region

当 Table 随着记录数不断增加而变大后,会逐渐分裂成多份 splits,成为 regions,一个 region 由[startkey,endkey)表示,不同的 region 会被 Master 分配给相应的 RegionServer 进行管理:

As the number of records increases, the table will gradually split into multiple splits and become regions. A region is represented by [startkey, endkey). Different regions will be assigned by the master to the corresponding regionserver for management

-ROOT- && .META. Table

-ROOT- && .META. Table

HBase 中有两张特殊的 Table,-ROOT-和.META.

HBase has two special tables, – root – and. Meta

.META.:记录了用户表的 Region 信息,.META.可以有多个 region

. meta: records the region information of user table. Meta. Can have multiple regions

-ROOT-:记录了.META.表的 Region 信息,-ROOT-只有一个 region

-Root -: records the region information of the. Meta table, – root – has only one region

Ø Zookeeper 中记录了-ROOT-表的 location

&The location of the – root – table is recorded in amp; Oslash; zookeeper

Client 访问用户数据之前需要首先访问 zookeeper,然后访问-ROOT-表,接着访问.META.表,最后才能找到用户数据的位置去访问,中间需要多次网络操作,不过 client 端会做 cache 缓存。

Before the client accesses the user data, it needs to first access the zookeeper, then the root table, and then the. Meta. Table. Then, it can find the location of the user data to access. In the middle, it needs multiple network operations, but the client side will cache.

MapReduce on HBase

MapReduce on HBase

在 HBase 系统上运行批处理运算,最方便和实用的模型依然是 MapReduce,如下图:

MapReduce is still the most convenient and practical model to run batch operation on HBase system, as shown in the following figure:

HBase Table 和 Region 的关系,比较类似 HDFS File 和 Block 的关系,HBase 提供了配套的 TableInputFormat 和 TableOutputFormat API,可以方便的将 HBase Table 作为 Hadoop MapReduce 的 Source 和 Sink,对于 MapReduce Job 应用开发人员来说,基本不需要关注 HBase 系统自身的细节。

The relationship between HBase table and region is similar to the relationship between HDFS file and block. HBase provides supporting tableinputformat and tableoutputformat APIs, which can conveniently use HBase table as the source and sink of Hadoop MapReduce. For MapReduce job application developers, they don’t need to pay attention to the details of HBase system itself.

HBase 系统架构

HBase system architecture

Client

Client

HBase Client 使用 HBase 的 RPC 机制与 HMaster 和 HRegionServer 进行通信,对于管理类操作,Client 与 HMaster 进行 RPC;对于数据读写类操作,Client 与 HRegionServer 进行 RPC

HBase client uses the RPC mechanism of HBase to communicate with hmaster and hregionserver. For management operations, client and hmaster perform RPC; for data read and write operations, client and hregionserver perform RPC

1 Zookeeper

1 Zookeeper

Zookeeper Quorum 中除了存储了-ROOT-表的地址和 HMaster 的地址,HRegionServer 也会把自己以 Ephemeral 方式注册到 Zookeeper 中,使得 HMaster 可以随时感知到各个 HRegionServer 的健康状态。此外,Zookeeper 也避免了 HMaster 的单点问题,见下文描述

In addition to storing the address of root table and hmaster in zookeeper quorum, hregionserver will also register itself in zookeeper in ephemeral mode, so that hmaster can sense the health status of each hregionserver at any time. In addition, zookeeper also avoids the single point problem of hmaster, as described below

HMaster

HMaster

HMaster 没有单点问题,HBase 中可以启动多个 HMaster,通过 Zookeeper 的 Master Election 机制保证总有一个 Master 运行,HMaster 在功能上主要负责 Table 和 Region 的管理工作:

There is no single problem with hmaster. Multiple hmasters can be started in HBase. One master is always running through zookeeper’s master element mechanism. Hmaster is mainly responsible for the management of tables and regions in terms of function

1. 管理用户对 Table 的增、删、改、查操作

1. Manage the addition, deletion, modification and query of tables by users

2. 管理 HRegionServer 的负载均衡,调整 Region 分布

2. Manage the load balancing of the hregion server and adjust the region distribution

3. 在 Region Split 后,负责新 Region 的分配

3. After region split, be responsible for the allocation of new regions

4. 在 HRegionServer 停机后,负责失效 HRegionServer 上的 Regions 迁移

4. After the hregion server is down, be responsible for the regions migration on the failed hregionserver

HRegionServer

HRegionServer

HRegionServer 主要负责响应用户 I/O 请求,向 HDFS 文件系统中读写数据,是 HBase 中最核心的模块。

Hregionserver is mainly responsible for responding to user I / O requests and reading and writing data to HDFS file system. It is the core module of HBase.

HRegionServer 内部管理了一系列 HRegion 对象,每个 HRegion 对应了 Table 中的一个 Region,HRegion 中由多个 HStore 组成。每个 HStore 对应了 Table 中的一个 Column Family 的存储,可以看出每个 Column Family 其实就是一个集中的存储单元,因此最好将具备共同 IO 特性的 column 放在一个 Column Family 中,这样最高效。

The hregionserver manages a series of hregion objects internally. Each hregment corresponds to a region in the table, which is composed of multiple hstores. Each hsstore corresponds to the storage of a column family in the table. It can be seen that each column family is actually a centralized storage unit. Therefore, it is best to put columns with common IO characteristics in one column family, which is the most efficient.

HStore 存储是 HBase 存储的核心了,其中由两部分组成,一部分是 MemStore,一部分是 StoreFiles。MemStore 是 Sorted Memory Buffer,用户写入的数据首先会放入 MemStore,当 MemStore 满了以后会 Flush 成一个 StoreFile(底层实现是 HFile),当 StoreFile 文件数量增长到一定阈值,会触发 Compact 合并操作,将多个 StoreFiles 合并成一个 StoreFile,合并过程中会进行版本合并和数据删除,因此可以看出 HBase 其实只有增加数据,所有的更新和删除操作都是在后续的 compact 过程中进行的,这使得用户的写操作只要进入内存中就可以立即返回,保证了 HBase I/O 的高性能。当 StoreFiles Compact 后,会逐步形成越来越大的 StoreFile,当单个 StoreFile 大小超过一定阈值后,会触发 Split 操作,同时把当前 Region Split 成 2 个 Region,父 Region 会下线,新 Split 出的 2 个孩子 Region 会被 HMaster 分配到相应的 HRegionServer 上,使得原先 1 个 Region 的压力得以分流到 2 个 Region 上。下图描述了 Compaction 和 Split 的过程:

Hsstore is the core of HBase storage, which is composed of two parts, one is memstore and the other is storefiles. Memstore is a sorted memory buffer. The data written by the user will be put into the memstore first. When the memstore is full, it will flush into a storefile (the underlying implementation is hfile). When the number of storefiles increases to a certain threshold, compact merge operation will be triggered to merge multiple storefiles into one Storefile, version merging and data deletion will be performed during the merging process. Therefore, it can be seen that HBase only adds data, and all update and deletion operations are performed in the subsequent compact process. As a result, the user’s write operations can be returned as soon as they enter the memory, thus ensuring the high performance of HBase I / O. After storefiles compact, more and more storefiles will be formed. When the size of a single storefile exceeds a certain threshold, the split operation will be triggered. At the same time, the current region will be split into two regions, and the parent region will be offline. The two child regions from the new split will be assigned to the corresponding hregion server by hmaster, making the original one region The pressure is divided into two regions. The following figure describes the process of compact and split:

在理解了上述 HStore 的基本原理后,还必须了解一下 HLog 的功能,因为上述的 HStore 在系统正常工作的前提下是没有问题的,但是在分布式系统环境中,无法避免系统出错或者宕机,因此一旦 HRegionServer 意外退出,MemStore 中的内存数据将会丢失,这就需要引入 HLog 了。每个 HRegionServer 中都有一个 HLog 对象,HLog 是一个实现 Write Ahead Log 的类,在每次用户操作写入 MemStore 的同时,也会写一份数据到 HLog 文件中(HLog 文件格式见后续),HLog 文件定期会滚动出新的,并删除旧的文件(已持久化到 StoreFile 中的数据)。当 HRegionServer 意外终止后,HMaster 会通过 Zookeeper 感知到,HMaster 首先会处理遗留的 HLog 文件,将其中不同 Region 的 Log 数据进行拆分,分别放到相应 region 的目录下,然后再将失效的 region 重新分配,领取 到这些 region 的 HRegionServer 在 Load Region 的过程中,会发现有历史 HLog 需要处理,因此会 Replay HLog 中的数据到 MemStore 中,然后 flush 到 StoreFiles,完成数据恢复。

After understanding the basic principles of the above hstores, we must also understand the functions of the Hlog, because the above-mentioned hstores have no problems under the premise of the normal operation of the system. However, in the distributed system environment, system errors or downtime cannot be avoided. Therefore, once the hregionserver exits unexpectedly, the memory data in the memstore will be lost, which requires the introduction of Hlog Yes. There is an Hlog object in each hregionserver. Hlog is a class that implements write ahead log. Each time a user writes to the memstore, it will also write a copy of data to the Hlog file (see the following for the format of the Hlog file). The Hlog file will periodically scroll out new files and delete the old files (the data that has been persisted in the storefile). When hregionserver is terminated unexpectedly, hmaster will perceive it through zookeeper. Hmaster will first process the legacy Hlog files, split the log data of different regions, and put them into the directory of corresponding regions. Then, the invalid regions will be redistributed. The hregionserver that receives these regions will be in the load region In the process, you will find that there is a historical Hlog that needs to be processed. Therefore, the data in the Hlog will be replayed to the memstore, and then flush to the storefiles to complete the data recovery.

存储格式

Storage format

HBase 中的所有数据文件都存储在 Hadoop HDFS 文件系统上,主要包括上述提出的两种文件类型:

All data files in HBase are stored on Hadoop HDFS file system, which mainly includes the two file types mentioned above

1. HFile, HBase 中 KeyValue 数据的存储格式,HFile 是 Hadoop 的二进制格式文件,实际上 StoreFile 就是对 HFile 做了轻量级包装,即 StoreFile 底层就是 HFile

1. Hfile, the storage format of keyValue data in HBase. Hfile is the binary format file of Hadoop. In fact, storefile is a lightweight package for hfile, that is, the underlying layer of storefile is hfile

2. HLog File,HBase 中 WAL(Write Ahead Log) 的存储格式,物理上是 Hadoop 的 Sequence File

2. Hlog file, the storage format of wal (write ahead log) in HBase is the sequence file of Hadoop physically

HFile

HFile

下图是 HFile 的存储格式:

The following figure shows the storage format of hfile:

首先 HFile 文件是不定长的,长度固定的只有其中的两块:Trailer 和 FileInfo。正如图中所示的,Trailer 中有指针指向其他数据块的起始点。File Info 中记录了文件的一些 Meta 信息,例如:AVG_KEY_LEN, AVG_VALUE_LEN, LAST_KEY, COMPARATOR, MAX_SEQ_ID_KEY 等。Data Index 和 Meta Index 块记录了每个 Data 块和 Meta 块的起始点。

First of all, hfile files are variable in length, and only two of them are fixed in length: trailer and FileInfo. As shown in the figure, there are pointers in the trailer that point to the starting points of other data blocks. File info records some meta information of the file, such as AVG_ KEY_ LEN, AVG_ VALUE_ LEN, LAST_ KEY, COMPARATOR, MAX_ SEQ_ ID_ Key, etc. The data index and meta index blocks record the starting point of each data block and meta block.

Data Block 是 HBase I/O 的基本单元,为了提高效率,HRegionServer 中有基于 LRU 的 Block Cache 机制。每个 Data 块的大小可以在创建一个 Table 的时候通过参数指定,大号的 Block 有利于顺序 Scan,小号 Block 利于随机查询。每个 Data 块除了开头的 Magic 以外就是一个个 KeyValue 对拼接而成, Magic 内容就是一些随机数字,目的是防止数据损坏。后面会详细介绍每个 KeyValue 对的内部构造。

Data block is the basic unit of HBase I / O. in order to improve the efficiency, hregionserver has a block cache mechanism based on LRU. The size of each data block can be specified by parameters when creating a table. Large block is conducive to sequential scan, and small block is conducive to random query. In addition to the magic at the beginning, each data block is a keyValue pair, and magic content is some random numbers, in order to prevent data corruption. The internal structure of each keyValue pair is described in detail later.

HFil

HFil


速搜资源网 , 版权所有丨如未注明 , 均为原创丨转载请注明原文链接:【速搜问答】HBase是什么
喜欢 (0)
[361009623@qq.com]
分享 (0)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址