• 欢迎访问速搜资源吧,如果在网站上找不到你需要的资源,可以在留言板上留言,管理员会尽量满足你!

【速搜问答】BTRFS是什么

问答 admin 3年前 (2020-07-20) 462次浏览 已收录 0个评论

汉英对照:
Chinese-English Translation:

BTRFS(通常念成Butter FS),由Oracle于2007年宣布并进行中的COW文件系统。目标是取代Linuxext3文件系统,改善ext3的限制,特别是单一文件大小的限制,总文件系统大小限制以及加入文件校验和特性

Btrfs (usually pronounced as butterfs), a cow file system announced and in progress by Oracle in 2007. The goal is to replace the Linux ext3 file system and improve ext3’s limitations, especially the single file size limit, the total file system size limit, and the addition of file checksums

BTRFS(通常念成 Butter FS),由 Oracle 于 2007 年宣布并进行中的 COW(copy-on-write 式)文件系统。目标是取代 Linuxext3 文件系统,改善 ext3 的限制,特别是单一文件大小的限制,总文件系统大小限制以及加入文件校验和特性。加入 ext3/4 未支持的一些功能,例如可写的磁盘快照(snapshots),以及支持递归的快照(snapshots of snapshots),内建磁盘阵列(RAID)支持,支持子卷(Subvolumes)的概念,允许在线调整文件系统大小。

Btrfs (usually pronounced as butterfs), a cow (copy on write) file system announced by Oracle in 2007. The goal is to replace the Linux ext3 file system and improve ext3’s limitations, especially the single file size limit, the total file system size limit and the addition of file checksums. Some features not supported by ext3 / 4 are added, such as writable snapshots, recursive snapshots of snapshots, raid support, subvolumes concept, and online file system size adjustment.

首先是扩展性 (scalability) 相关的特性,btrfs 最重要的设计目标是应对大型机器对文件系统的扩展性要求。 Extent,B-Tree 和动态 inode 创建等特性保证了 btrfs 在大型机器上仍有卓越的表现,其整体性能而不会随着系统容量的增加而降低。

The first is the scalability related features. The most important design goal of Btrfs is to deal with the scalability requirements of large machines for file system. The features of extend, B-tree and dynamic inode creation ensure that Btrfs still has excellent performance on large machines, and its overall performance will not decrease with the increase of system capacity.

其次是数据一致性 (data integrity) 相关的特性。系统面临不可预料的硬件故障,Btrfs 采用 COW 事务技术来保证文件系统的一致性。 btrfs 还支持 checksum,避免了 silent corrupt 的出现。而传统文件系统则无法做到这一点。

The second is data integrity. The system is faced with unexpected hardware failure, and Btrfs adopts cow transaction technology to ensure the consistency of file system. Btrfs also supports checksums to avoid silent corruption. Traditional file systems cannot.

第三是和多设备管理相关的特性。 Btrfs 支持创建快照 (snapshot),和克隆 (clone) 。 btrfs 还能够方便的管理多个物理设备,使得传统的卷管理软件变得多余。

The third is the features related to multi device management. Btrfs supports snapshot and clone creation. Btrfs can also easily manage multiple physical devices, which makes the traditional volume management software redundant.

最后是其他难以归类的特性。这些特性都是比较先进的技术,能够显著提高文件系统的时间 / 空间性能,包括延迟分配,小文件的存储优化,目录索引等。

Finally, there are other features that are difficult to categorize. These features are relatively advanced technologies, which can significantly improve the time / space performance of the file system, including delay allocation, storage optimization of small files, directory index, etc.

相关特性

Related characteristics

B-Tree

B-Tree

btrfs 文件系统中所有的 metadata 都由 B-Tree 管理。使用 B-Tree 的主要好处在于查找,插入和删除操作都很高效。可以说 B-Tree 是 btrfs 的核心。

All metadata in Btrfs file system is managed by B-tree. The main advantage of using B-tree is that the search, insert and delete operations are efficient. It can be said that B-tree is the core of Btrfs.

一味地夸耀 B-Tree 很好很高效也许并不能让人信服,但假如稍微花费一点儿时间看看 ext2/3 中元数据管理的实现方式,便可以反衬出 B-Tree 的优点。

If the data management of ext3 / B is a little bit more efficient, it may not be convincing.

妨碍 ext2/3 扩展性的一个问题来自其目录的组织方式。目录是一种特殊的文件,在 ext2/3 中其内容是一张线性表格。

One problem that hinders ext2 / 3’s extensibility is the way its directories are organized. A directory is a special file whose contents are a linear table in ext2 / 3.

这种结构在文件个数有限的情况下是比较直观的设计,但随着目录下文件数的增加,查找文件的时间将线性增长。 2003 年,ext3 设计者开发了目录索引技术,解决了这个问题。目录索引使用的数据结构就是 B-Tree 。如果同一目录下的文件数超过 2K,inode 中的 i_data 域指向一个特殊的 block 。在该 block 中存储着目录索引 B-Tree 。 B-Tree 的查找效率高于线性表,

This structure is more intuitive when the number of files is limited, but with the increase of the number of files in the directory, the time to find files will increase linearly. In 2003, ext3 designers developed directory index technology to solve this problem. The data structure used by directory index is B-tree. If the number of files in the same directory exceeds 2K, I_ The data field points to a special block. The directory index B-tree is stored in the block. The search efficiency of B-tree is higher than that of linear table,

但为同一个元数据设计两种数据结构总是不太优雅。在文件系统中还有很多其他的元数据,用统一的 BTree 管理是非常简单而优美的设计。

But designing two data structures for the same metadata is not always elegant. There are many other metadata in the file system. It is a very simple and elegant design to manage it with unified BTREE.

Btrfs 内部所有的元数据都采用 BTree 管理,拥有良好的可扩展性。 btrfs 内部不同的元数据由不同的 Tree 管理。在 superblock 中,有指针指向这些 BTree 的根。

All metadata in Btrfs is managed by BTREE, which has good scalability. Different metadata in Btrfs is managed by different trees. In superblock, there is a pointer to the root of these Btrees.

FS Tree 管理文件相关的元数据,如 inode,dir 等; Chunk tree 管理设备,每一个磁盘设备都在 Chunk Tree 中有一个 item ; Extent Tree 管理磁盘空间分配,btrfs 每分配一段磁盘空间,便将该磁盘空间的信息插入到 Extent tree 。查询 Extent Tree 将得到空闲的磁盘空间信息; Tree of tree root 保存很多 BTree 的根节点。比如用户每建立一个快照,btrfs 便会创建一个 FS Tree 。为了管理所有的树,btrfs 采用 Tree of tree root 来保存所有树的根节点; checksum Tree 保存数据块的校验和。

FS tree manages metadata related to files, such as inode, dir, etc.; chunk tree manages devices, and each disk device has an item in the chunk tree; extend tree manages disk space allocation. Every time Btrfs allocates a piece of disk space, it inserts the disk space information into the extend tree. Querying extend tree will get free disk space information; tree of tree root stores many root nodes of BTREE. For example, Btrfs creates an FS tree for each snapshot created by the user. In order to manage all trees, Btrfs uses tree of tree root to save the root nodes of all trees, and the checkup tree saves the checksums of data blocks.

基于 Extent 的文件存储

File storage based on extend

现代很多文件系统都采用了 extent 替代 block 来管理磁盘。 Extent 就是一些连续的 block,一个 extent 由起始的 block 加上长度进行定义。

Many modern file systems use extend instead of block to manage disks. Extension is some continuous blocks. An extension is defined by the initial block plus the length.

Extent 能有效地减少元数据开销。为了进一步理解这个问题,我们还是看看 ext2 中的反面例子。

Extend can effectively reduce the cost of metadata. To further understand this problem, let’s look at the reverse example in ext2.

ext2/3 以 block 为基本单位,将磁盘划分为多个 block 。为了管理磁盘空间,文件系统需要知道哪些 block 是空闲的。 Ext 使用 bitmap 来达到这个目的。 Bitmap 中的每一个 bit 对应磁盘上的一个 block,当相应 block 被分配后,bitmap 中的相应 bit 被设置为 1 。这是很经典也很清晰的一个设计,但不幸的是当磁盘容量变大时,bitmap 自身所占用的空间也将变大。这就导致了扩展性问题,随着存储设备容量的增加,bitmap 这个元数据所占用的空间也随之增加。而人们希望无论磁盘容量如何增加,元数据不应该随之线形增加,这样的设计才具有可扩展性。

Ext2 / 3 divides the disk into multiple blocks based on block. In order to manage disk space, the file system needs to know which blocks are free. Ext uses bitmap to do this. Each bit in the bitmap corresponds to a block on the disk. After the corresponding block is allocated, the corresponding bit in the bitmap is set to 1. This is a classic and clear design, but unfortunately, as the disk capacity increases, the space occupied by the bitmap itself will also increase. This leads to scalability problems. With the increase of storage device capacity, the space occupied by bitmap metadata increases. However, people hope that no matter how the disk capacity increases, metadata should not increase linearly, so that the design has scalability.

优化支持

Optimization support

SSD 是固态存储 Solid State Disk 的简称。在过去的几十年中,CPU/RAM 等器件的发展始终遵循着摩尔定律,但硬盘 HDD 的读写速率却始终没有飞跃式的发展。磁盘 IO 始终是系统性能的瓶颈。

SSD is short for solid state disk. In the past few decades, the development of CPU / ram and other devices has always followed Moore’s law, but the read-write rate of hard disk HDD has not been developed by leaps and bounds. Disk IO is always the bottleneck of system performance.

SSD 采用 flash memory 技术,内部没有磁盘磁头等机械装置,读写速率大幅度提升。 flash memory 有一些不同于 HDD 的特性。 flash 在写数据之前必须先执行擦除操作;其次,flash 对擦除操作的次数有一定的限制,在技术水平下,对同一个数据单元最多能进行约 10 万次擦除操作,因此,为了延长 flash 的寿命,应该将写操作平均到整个 flash 上。

SSD uses flash memory technology, there is no disk head and other mechanical devices inside, so the read-write rate is greatly improved. Flash memory has some characteristics different from HDD. Flash must perform erase operation before writing data; secondly, flash has a certain limit on the number of erasure operations. Under the technical level, the same data unit can be erased for about 100000 times at most. Therefore, in order to prolong the life of flash, the write operation should be averaged to the whole flash.

SSD 在硬件内部的微代码中实现了 wear leveling 等分布写操作的技术,因此系统无须再使用特殊的 MTD 驱动和 FTL 层。虽然 SSD 在硬件层面做了很多努力,但毕竟还是有限。文件系统针对 SSD 的特性做优化不仅能提高 SSD 的使用寿命,而且能提高读写性能。 Btrfs 是少数专门对 SSD 进行优化的文件系统。 btrfs 用户可以使用 mount 参数打开对 SSD 的特殊优化处理。

SSD implements the distributed writing technology such as wear leveling in the micro code of hardware, so the system does not need to use special MTD driver and FTL layer. Although SSD has made a lot of efforts at the hardware level, it is still limited. File system optimization for SSD characteristics can not only improve the service life of SSD, but also improve the read-write performance. Btrfs is one of the few file systems designed to optimize SSDs. Btrfs users can use the mount parameter to turn on special optimizations for SSDs.

Btrfs 的 COW 技术从根本上避免了对同一个物理单元的反复写操作。如果用户打开了 SSD 优化选项,btrfs 将在底层的块空间分配策略上进行优化:将多次磁盘空间分配请求聚合成一个大小为 2M 的连续的块。大块连续地址的 IO 能够让固化在 SSD 内部的微代码更好的进行读写优化,从而提高 IO 性能。

The cow technology of Btrfs can avoid the repeated writing operation to the same physical unit. If the user turns on the SSD optimization option, Btrfs will optimize the underlying block space allocation strategy: aggregate multiple disk space allocation requests into a continuous block of 2m size. Large continuous address IO can make the microcode embedded in SSD better read and write optimization, so as to improve IO performance.


速搜资源网 , 版权所有丨如未注明 , 均为原创丨转载请注明原文链接:【速搜问答】BTRFS是什么
喜欢 (0)
[361009623@qq.com]
分享 (0)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址