• 欢迎访问速搜资源吧,如果在网站上找不到你需要的资源,可以在留言板上留言,管理员会尽量满足你!

【速搜问答】hive是什么

问答 admin 2个月前 (08-13) 55次浏览 已收录 0个评论

汉英对照:
Chinese-English Translation:

hive 是基于 Hadoop 的一个数据仓库工具,用来进行数据提取、转化、加载,这是一种可以存储、查询和分析存储在 Hadoop 中的大规模数据的机制。 hive 是基于 Hadoop 的一个数据仓库工具,用来进行数据提取、转化、加载,这是一种可以存储、查询和分析存储在 Hadoop 中的大规模数据的机制。hive 数据仓库工具能将结构化的数据文件映射为一张数据库表,并提供 SQL 查询功能,能将 SQL 语句转变成 MapReduce 任务来执行。

Hive is a data warehouse tool based on Hadoop, which is used to extract, transform and load data. It is a mechanism that can store, query and analyze large-scale data stored in Hadoop. Hive is a data warehouse tool based on Hadoop, which is used to extract, transform and load data. It is a mechanism that can store, query and analyze large-scale data stored in Hadoop. Hive data warehouse tool can map the structured data file into a database table, and provide SQL query function. It can transform SQL statement into MapReduce task to execute.

Hive 的优点是学习成本低,可以通过类似 SQL 语句实现快速 MapReduce 统计,使 MapReduce 变得更加简单,而不必开发专门的 MapReduce 应用程序。hive 十分适合对数据仓库进行统计分析。

Hive has the advantage of low learning cost. It can realize fast MapReduce statistics by similar SQL statements, which makes MapReduce more simple without developing special MapReduce applications. Hive is very suitable for statistical analysis of data warehouse.

简介

brief introduction

hive 是基于 Hadoop 构建的一套数据仓库分析系统,它提供了丰富的 SQL 查询方式来分析存储在 Hadoop 分布式文件系统中的数据:可以将结构化的数据文件映射为一张数据库表,并提供完整的 SQL 查询功能;可以将 SQL 语句转换为 MapReduce 任务运行,通过自己的 SQL 查询分析需要的内容,这套 SQL 简称 Hive SQL,使不熟悉 mapreduce 的用户可以很方便地利用 SQL 语言查询、汇总和分析数据。而 mapreduce 开发人员可以把自己写的 mapper 和 reducer 作为插件来支持 hive 做更复杂的数据分析。

Hive is a set of data warehouse analysis system based on Hadoop. It provides rich SQL query methods to analyze the data stored in Hadoop distributed file system: it can map structured data file into a database table, and provide complete SQL query function; it can convert SQL statement into MapReduce task to run through its own SQL This set of SQL is called hive SQL for short. Users who are not familiar with MapReduce can easily use SQL language to query, summarize and analyze data. MapReduce developers can use mapper and reducer as plug-ins to support hive to do more complex data analysis.

它与关系型数据库的 SQL 略有不同,但支持了绝大多数的语句如 DDL、DML 以及常见的聚合函数、连接查询、条件查询。它还提供了一系列的 1:具进行数据提取转化加载,用来存储、查询和分析存储在 Hadoop 中的大规模数据集,并支持 UDF(User-Defined Function)、UDAF(User-Defnes AggregateFunction)和 USTF(User-Defined Table-Generating Function),也可以实现对 map 和 reduce 函数的定制,为数据操作提供了良好的伸缩性和可扩展性。

It is slightly different from SQL of relational database, but it supports most statements such as DDL, DML and common aggregate function, join query and conditional query. It also provides a series of 1: for data extraction, transformation and loading, which is used to store, query and analyze large-scale data sets stored in Hadoop. It also supports UDF (user defined function), udaf (user defnes aggregate function) and ustf (user defined table generating function), and can also implement Map and reduce Function customization provides good scalability and scalability for data operation.

hive 不适合用于联机(online)事务处理,也不提供实时查询功能。它最适合应用在基于大量不可变数据的批处理作业。hive 的特点包括:可伸缩(在 Hadoop 的集群上动态添加设备)、可扩展、容错、输入格式的松散耦合。

Hive is not suitable for online transaction processing and does not provide real-time query function. It is most suitable for batch jobs based on a large amount of immutable data. The features of hive include: scalability (add devices dynamically on Hadoop cluster), scalability, fault tolerance, loose coupling of input format.

适用场景

Applicable scenarios

hive 构建在基于静态批处理的 Hadoop 之上,Hadoop 通常都有较高的延迟并且在作业提交和调度的时候需要大量的开销。因此,hive 并不能够在大规模数据集上实现低延迟快速的查询,例如,hive 在几百 MB 的数据集上执行查询一般有分钟级的时间延迟。

Hive is built on Hadoop based on static batch processing. Hadoop usually has high latency and requires a lot of overhead when submitting and scheduling jobs. Therefore, hive is not able to implement low latency and fast queries on large data sets. For example, hive can execute queries on hundreds of MB data sets with a time delay of minutes.

因此,hive 并不适合那些需要高实时性的应用,例如,联机事务处理(OLTP)。hive 查询操作过程严格遵守 Hadoop MapReduce 的作业执行模型,hive 将用户的 hiveQL 语句通过解释器转换为 MapReduce 作业提交到 Hadoop 集群上,Hadoop 监控作业执行过程,然后返回作业执行结果给用户。hive 并非为联机事务处理而设计,hive 并不提供实时的查询和基于行级的数据更新操作。hive 的最佳使用场合是大数据集的批处理作业,例如,网络日志分析。

Therefore, hive is not suitable for applications that require high real-time performance, such as online transaction processing (OLTP). The hive query operation process strictly follows the Hadoop MapReduce job execution model. Hive converts the user’s hiveql statements into MapReduce jobs and submits them to the Hadoop cluster through the interpreter. Hadoop monitors the job execution process and returns the job execution results to the user. Hive is not designed for online transaction processing. Hive does not provide real-time query and row level data update operation. Hive is best used for batch jobs with large data sets, such as network log analysis.

设计特征

Design features

hive 是一种底层封装了 Hadoop 的数据仓库处理工具,使用类 SQL 的 hiveSQL 语言实现数据查询,所有 hive 的数据都存储在 Hadoop 兼容的文件系统(例如,Amazon S3、HDFS)中。hive 在加载数据过程中不会对数据进行任何的修改,只是将数据移动到 HDFS 中 hive 设定的目录下,因此,hive 不支持对数据的改写和添加,所有的数据都是在加载的时候确定的。hive 的设计特点如下。

Hive is a data warehouse processing tool that encapsulates Hadoop at the bottom. It uses the hivesql language like SQL to realize data query. All the data of hive are stored in Hadoop compatible file systems (such as Amazon S3, HDFS). Hive will not make any changes to the data in the process of loading data, but will only move the data to the directory set by hive in HDFS. Therefore, hive does not support data rewriting and adding. All data are determined at the time of loading. The design features of hive are as follows.

支持创建索引,优化数据查询。

Support to create index, optimize data query.

不同的存储类型,例如,纯文本文件、HBase 中的文件。

Different storage types, such as plain text files, files in HBase.

将元数据保存在关系数据库中,大大减少了在查询过程中执行语义检查的时间。

Saving metadata in relational database greatly reduces the time of semantic checking in query process.

可以直接使用存储在 Hadoop 文件系统中的数据。

Data stored in the Hadoop file system can be used directly.

内置大量用户函数 UDF 来操作时间、字符串和其他的数据挖掘工具,支持用户扩展 UDF 函数来完成内置函数无法实现的操作。

Built in a large number of user functions UDF to operate time, string and other data mining tools, support users to extend UDF functions to complete the operations that the built-in functions cannot achieve.

类 SQL 的查询方式,将 SQL 查询转换为 MapReduce 的 job 在 Hadoop 集群上执行。

SQL like query mode, which converts SQL query into MapReduce job and executes it on Hadoop cluster.


速搜资源网 , 版权所有丨如未注明 , 均为原创丨转载请注明原文链接:【速搜问答】hive是什么
喜欢 (0)
[361009623@qq.com]
分享 (0)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址