• 欢迎访问速搜资源吧,如果在网站上找不到你需要的资源,可以在留言板上留言,管理员会尽量满足你!

【速搜问答】分布式文件系统是什么

问答 admin 2年前 (2020-08-18) 459次浏览 已收录 0个评论

汉英对照:
Chinese-English Translation:

分布式文件系统(DFS)是指文件系统管理的物理存储资源不一定直接连接在本地节点上,而是通过计算机网络与节点相连;或是若干不同的逻辑磁盘分区或卷标组合在一起而形成的完整的有层次的文件系统。

Distributed file system (DFS) refers to the physical storage resources managed by the file system, which are not directly connected to the local node, but connected to the node through the computer network, or a complete hierarchical file system formed by the combination of several different logical disk partitions or volume labels.

分布式文件系统(Distributed File System,DFS)是指文件系统管理的物理存储资源不一定直接连接在本地节点上,而是通过计算机网络与节点(可简单的理解为一台计算机)相连;或是若干不同的逻辑磁盘分区或卷标组合在一起而形成的完整的有层次的文件系统。

Distributed file system (DFS) refers to the physical storage resources managed by the file system, which are not directly connected to the local node, but connected to the node (which can be simply understood as a computer) through the computer network, or a complete hierarchical file system formed by the combination of several different logical disk partitions or volume labels.

DFS 为分布在网络上任意位置的资源提供一个逻辑上的树形文件系统结构,从而使用户访问分布在网络上的共享文件更加简便。单独的 DFS 共享文件夹的作用是相对于通过网络上的其他共享文件夹的访问点。

DFS provides a logical tree file system structure for the resources distributed in any location on the network, which makes it easier for users to access the shared files distributed on the network. The role of a separate DFS shared folder is relative to the access point through other shared folders on the network.

简介

brief introduction

计算机通过文件系统管理、存储数据,而信息爆炸时代中人们可以获取的数据成指数倍的增长,单纯通过增加硬盘个数来扩展计算机文件系统的存储容量的方式,在容量大小、容量增长速度、数据备份、数据安全等方面的表现都差强人意。分布式文件系统可以有效解决数据的存储和管理难题:将固定于某个地点的某个文件系统,扩展到任意多个地点/多个文件系统,众多的节点组成一个文件系统网络。每个节点可以分布在不同的地点,通过网络进行节点间的通信和数据传输。

Computers manage and store data through file system. In the era of information explosion, the data that people can obtain grows exponentially. The performance of capacity size, capacity growth rate, data backup, data security and so on is not satisfactory by simply increasing the number of hard disks. Distributed file system can effectively solve the problem of data storage and management: a file system fixed in a certain location can be extended to any multiple locations / multiple file systems, and many nodes form a file system network. Data can be transmitted between nodes and distributed across the network.

人们在使用分布式文件系统时,无需关心数据是存储在哪个节点上、或者是从哪个节点从获取的,只需要像使用本地文件系统一样管理和存储文件系统中的数据。分布式文件系统是建立在客户机/服务器技术基础之上的,一个或多个文件服务器与客户机文件系统协同操作,这样客户机就能够访问由服务器管理的文件。分布式文件系统的发展大体上经历子三个阶段:第一阶段是网络文件系统,第二阶段是共享 SAN 文件系统,第三阶段是面向对象的并行文件系统。

When people use the distributed file system, they don’t need to care which node the data is stored on or from which node, they just need to manage and store the data in the file system just like using the local file system. Distributed file system is based on client / server technology. One or more file servers work together with the client file system, so that the client can access the files managed by the server. The development of distributed file system can be divided into three stages: the first stage is network file system, the second stage is shared San file system, and the third stage is object-oriented parallel file system.

分布式文件系统把大量数据分散到不同的节点上存储,大大减小了数据丢失的风险。分布式文件系统具有冗余性,部分节点的故障并不影响整体的正常运行,而且即使出现故障的计算机存储的数据已经损坏,也可以由其它节点将损坏的数据恢复出来。因此,安全性是分布式文件系统最主要的特征。分布式文件系统通过网络将大量零散的计算机连接在一起,形成一个巨大的计算机集群,使各主机均可以充分发挥其价值。此外,集群之外的计算机只需要经过简单的配置就可以加入到分布式文件系统中,具有极强的可扩展能力。

Distributed file system distributes a large amount of data to different nodes, which greatly reduces the risk of data loss. Distributed file system has redundancy. The failure of some nodes does not affect the normal operation of the whole system. Even if the data stored in the computer has been damaged, the damaged data can be recovered by other nodes. Therefore, security is the most important feature of distributed file system. Distributed file system connects a large number of scattered computers through the network, forming a huge computer cluster, so that each host can give full play to its value. In addition, computers outside the cluster can be added to the distributed file system only after simple configuration, which has strong scalability.

系统分类

System classification

网络文件系统

Network file system

(NFS) 最早由 Sun 微系统公司作为 TCP/IP 网上的文件共享系统开发。Sun 公司估计大约有超过 310 万个系统在运行 NFS,大到大型计算机、小至 PC 机,其中至少有 80%的系统是非 Sun 平台。

(NFS) was first developed by Sun Microsystems as a file sharing system on TCP / IP network. Sun estimates that there are more than 3.1 million systems running NFS, from large computers to small PCs, and at least 80% of them are non sun platforms.

Andrew 系统

Andrew system

AFS 是一种分布式的文件系统用来共享与获得在计算机网络中存放的文件。AFS 使得用户获得网络文件就像本地机器般方便。AFS 文件系统被称为“分布式”是因为文件可以分散地存放在很多不同的机器上,但这些文件对于用户而言是可及的,用户可以通过一定的方式得到这些文件。

AFS is a distributed file system, which is used to share and obtain the files stored in the computer network. AFS makes it as convenient for users to obtain network files as local machines. AFS file system is called “distributed” because files can be distributed on many different machines, but these files are accessible to users, and users can get these files in a certain way.

KASS 系统

Kass system

KASS File System(简称 KFS)是开始软件自主研发基于 JAVA 的纯分布式文件系统,功能类似于 DFS、GFS、Hadoop,通过 HTTP WEB 为企业的各种信息系统提供底层文件存储及访问服务,搭建企业私有云存储服务平台。

Kass file system (hereinafter referred to as KFS) is a pure distributed file system based on Java, which is independently developed by software. Its functions are similar to DFS, GFS and Hadoop. It provides the underlying file storage and access services for various information systems of enterprises through HTTP web, and builds enterprise private cloud storage service platform.

DFS 系统

DFS system

DFS 是 AFS 的一个版本,作为开放软件基金会(OSF)的分布

DFS is a version of AFS as an open software foundation (OSF) distribution

式计算环境 DCE 中的文件系统部分。

The file system part of DCE.

如果文件的访问仅限于一个用户,那么分布式文件系统就很容易实现。可惜的是,在许多网络环境中这种限制是不现实的,必须采取并发控制来实现文件的多用户访问,表现为如下几个形式:

If file access is limited to one user, distributed file system is easy to implement. Unfortunately, in many network environments, this kind of restriction is not realistic. It is necessary to adopt concurrency control to realize multi-user access of files, which is shown in the following forms:

只读共享 任何客户机只能访问文件,而不能修改它,这实现起来很简单。

Read only sharing any client can only access the file and cannot modify it, which is easy to implement.

受控写操作 采用这种方法,可有多个用户打开一个文件,但只有一个用户进行写修改。而该用户所作的修改并不一定出现在其它已打开此文件的用户的屏幕上。

Controlled write operation in this way, multiple users can open a file, but only one user can write and modify it. The changes made by the user do not necessarily appear on the screen of other users who have opened the file.

并发写操作 这种方法允许多个用户同时读写一个文件。但这需要操作系统作大量的监控工作以防止文件重写,并保证用户能够看到最新信息。这种方法即使实现得很好,许多环境中的处理要求和网络通信量也可能使它变得不可接受。

Concurrent write operations allow multiple users to read and write to a file at the same time. But this requires a lot of monitoring work by the operating system to prevent file rewriting and ensure that users can see the latest information. Even if this method is well implemented, the processing requirements and network traffic in many environments may make it unacceptable.

网络文件系统

Network file system

网络文件系统(Network File System,NFS)是个分布式的客户机/服务器文件系统。NFS 的实质在于用户间计算机的共享。用户可以联结到共享计算机并像访问本地硬盘一样访问共享计算机上的文件。管理员可以建立远程系统上文件的访问,以至于用户感觉不到他们是在访问远程文件。NFS 是个到处可用和广泛实现的开放式系统。

Network file system (NFS) is a distributed client / server file system. The essence of NFS is the sharing of computers among users. Users can connect to the shared computer and access the files on the shared computer as if they were accessing the local hard disk. Administrators can establish access to files on remote systems so that users do not feel that they are accessing remote files. NFS is an open system that is widely available and widely implemented.

NFS 设计目标

NFS design objectives

允许用户象访问本地文件一样访问其他系统上的文件。提供对无盘工作站的支持以降低网络开销。

Allows users to access files on other systems just as they access local files. Provide support for diskless workstations to reduce network overhead.

简化应用程序对远程文件的访问使得不需要因访问这些文件而调用特殊的过程。

It simplifies the application’s access to remote files so that there is no need to call special procedures for accessing these files.

使用一次一个服务请求以使系统能从已崩溃的服务器或工作站上恢复。

Use one service request at a time to enable the system to recover from a crashed server or workstation.

采用安全措施保护文件免遭偷窃与破坏。

Adopt security measures to protect documents from theft and damage.

使 NFS 协议可移植和简单,以便它们能在许多不同计算机上实现,包括低档的 PC 机。

Make the NFS protocol portable and simple so that they can be implemented on many different computers, including low-end PCs.

大型计算机、小型计算机和文件服务器运行 NFS 时,都为多个用户提供了一个文件存储区。工作站只需要运行 TCP/IP 协议来访问这些系统和位于 NFS 存储区内的文件。工作站上的 NFS 通常由 TCP/IP 软件支持。对 DOS 用户,一个远程 NFS 文件存储区看起来是另一个磁盘驱动器盘符。对 Macintosh 用户,远程 NFS 文件存储区就是一个图标。

When large computers, small computers, and file servers run NFS, they provide a file store for multiple users. The workstation only needs to run the TCP / IP protocol to access these systems and files located in the NFS store. NFS on workstations is usually supported by TCP / IP software. For DOS users, a remote NFS file store appears to be another disk drive letter. For Macintosh users, the remote NFS file store is an icon.

分布式文件系统 KFS

Distributed file system KFS

一个 KFS 集群包括单个元数据服务器节点和多个 Chunk 服务器节点,并由多个客户端来访问。其中元数据服务器主要用于维护元数据并负责控制垃圾回收、负载均衡等系统活动,Chunk 服务器负责保存数据以及接收处理数据 I/O 请求。这两类节点均运行 Linux 操作系统,需要分别安装、运行 KFS 提供的元数据服务器和 Chunk 服务器软件。客户端需要安装专门的 KFS 客户端库,由应用程序链接使用来访问 KFS 文件系统。KFS 的所有节点都均采用 X86 架构硬件,并通过以太网方式连接在一起,节点之间采用 TCP 协议通讯,使得整个系统具有较高的性价比。

A KFS cluster includes a single metadata server node and multiple chunk server nodes, and is accessed by multiple clients. Metadata server is mainly used to maintain metadata and control system activities such as garbage collection and load balancing. Chunk server is responsible for saving data and receiving and processing data I / O requests. These two types of nodes run Linux operating system, and need to install and run the metadata server and chunk server software provided by KFS respectively. The client needs to install a special KFS client library, which is used by the application link to access the KFS file system. All nodes of KFS adopt x86 architecture hardware and are connected together by Ethernet. TCP protocol is used to communicate between nodes, which makes the whole system have high cost performance.

Chunk 服务器

Chunk server

在 KFS 中,一个文件被分割成多个 Chunk,每个 Chunk 大小固定为 64MB,所以可以通过简单的模运算计算出某文件偏移量在该文件第几个 Chunk 的多少偏移量上。每个 Chunk 由一个全局唯一的 Chunk 号来标识。Chunk 服务器主要的功能就是保存 Chunk,并对外提供创建、删除、读写 Chunk 的访问接口。一个 Chunk 默认被复制成 3 份,保存在 3 个不同的 Chunk 服务器中,客户端可以为每个文件指定不同的副本个数。三副本就保证了在两个 Chunk 服务器故障的情况下,仍能从第三个 Chunk 服务器上的副本读出数据,提高了系统的可靠性。在 Chunk 数据写入时,若某个 Chunk 服务器突然故障,会导致的相应副本更新失败,进而影响 Chunk 各副本数据的一致性。

In KFS, a file is divided into multiple chunks, and the size of each chunk is fixed at 64MB. Therefore, it is possible to calculate the offset of a file in the number of chunks in the file by simple modular operation. Each chunk is identified by a globally unique chunk number. The main function of chunk server is to save chunk and provide access interface for creating, deleting, reading and writing chunk. A chunk is copied into 3 copies by default and stored in three different chunk servers. The client can specify a different number of copies for each file. Three copies ensure that the data can still be read from the copy of the third chunk server when two chunk servers fail, which improves the reliability of the system. When chunk data is written, if a chunk server fails suddenly, it will cause the corresponding replica update failure, which will affect the consistency of the chunk data.

为了解决这个问题,KFS 为每个 Chunk 副本分配一个版本号,副本每被更新一次则版本号上升,这样就可以通过比较版本号来发现过期的副本。Chunk 服务器中,单个 Chunk 由一个文件来表示,这些 Chunk 文件被保存在本地的文件系统中,文件系统可以是 XFS、Ext3/4 等。每个 Chunk 文件除了保存数据外,其头部还保存了 16KB 大小的校验和信息:写数据时,为每个 64K 数据块计算一个 32 位校验和(Adler-32 算法),保存至 Chunk 文件头部;读数据时,首先验证读出数据的校验和,这就保证了本地磁盘保存数据时可能发生的数据损坏可以被检查出来。

In order to solve this problem, KFS assigns a version number to each chunk replica. When the replica is updated, the version number will rise. In this way, the expired copies can be found by comparing the version numbers. In the chunk server, a single chunk is represented by a file. These chunk files are saved in the local file system, which can be XFS, ext3 / 4, etc. In addition to saving data, the header of each chunk file also stores a 16kb size of checksums: when writing data, a 32-bit parity check sum (adler-32 algorithm) is calculated for each 64K data block and saved to chunk File header: when reading data, first verify the check sum of the read data, which ensures that the data corruption that may occur when the local disk saves the data can be detected.

Chunk 文件在本地文件系统的文件名命名规则为:文件号.Chunk 号.版本号。Chunk 服务器启动时通过扫描存放 Chunk 文件的目录,就可获知自己拥有哪些 Chunk,然后把该信息提交给元数据服务器。元数据服务器在验证 Chunk 信息后,会告知 Chunk 服务器哪些 Chunk 已经过期(版本号过期或属于已被删除的文件),Chunk 服务器便可以删除这些 Chunk。

The naming rule of chunk file in local file system is: file number. Chunk number. Version number. When the chunk server starts, it can know which chunks it owns by scanning the directory where the chunk files are stored, and then submit the information to the metadata server. After verifying the chunk information, the metadata server will inform the chunk server which chunks have expired (the version number has expired or belongs to the deleted file), and the chunk server can delete these chunks.

在 KFS 中,一个文件系统的所有元数据由单个元数据服务器统一管理,这一做法虽然影响了系统的可扩展性,但极大简化了系统的设计。元数据包括了:

In KFS, all metadata of a file system is managed by a single metadata server. Although this method affects the scalability of the system, it greatly simplifies the design of the system. Metadata includes:

(1) 目录项信息:KFS 采用传统的目录结构命名空间,目录树中的所有节点(文件和目录),均由一个全局唯一的文件号来标识,根目录的文件号固定为 2,目录项信息指的是目录树中各目录所包含的各目录项(可以是子目录或文件)的名称及文件 ID;

(1) Directory item information: KFS uses the traditional directory structure namespace. All nodes (files and directories) in the directory tree are identified by a globally unique file number. The file number of the root directory is fixed to 2. The directory entry information refers to the name and file ID of each directory item (which can be a subdirectory or file) contained in each directory tree;

(2) 属性信息:各目录、文件的创建、修改时间,及文件的副本数、大小;

(2) Attribute information: the creation and modification time of each directory and file, as well as the number and size of file copies;

(3) Chunk 信息:一个文件依次由哪些 Chunk 组成的;

(3) Chunk information: which chunks a file consists of in turn;

(4) 位置信息:Chunk 的各个副本的被保存在哪个 Chunk 服务器上;

(4) Location information: which chunk server are the copies of chunk saved on;

(5) 租约信息:KFS 采用租约来维持多个客户端情况下数据的一致性,这些租约信息由元数据服务器统一管理。

(5) Lease information: KFS uses lease to maintain the data consistency of multiple clients, and the lease information is managed by metadata server.

客户端

client

KFS 的客户端库向应用程序开发者提供了 create、read、write、mkdir、rmdir 等类似 POSIX 语义的编程接口,支持的编程语言有 C++、Python、Java。客户端库通过与元数据服务器及 Chunk 服务器交互,完成对文件系统中文件的修改、访问等操作。此外,客户端还支持 FUSE(File system in Userspace),使得传统文件系统

KFS’s client-side library provides application developers with POSIX like programming interfaces such as create, read, write, MKDIR, rmdir. The supported programming languages are C + +, Python and Java. By interacting with metadata server and chunk server, client library can modify and access files in file system. In addition, the client also supports fuse (file system in userspace), which makes the traditional file system


速搜资源网 , 版权所有丨如未注明 , 均为原创丨转载请注明原文链接:【速搜问答】分布式文件系统是什么
喜欢 (0)
[361009623@qq.com]
分享 (0)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址