當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop集群的基本操作（四：Hive的基本操作）

發(fā)布時(shí)間：2023/11/27 编程问答 14 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hadoop集群的基本操作（四：Hive的基本操作）小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

實(shí)驗(yàn)

目的

要求

目的：

（1）掌握數(shù)據(jù)倉庫工具Hive的使用；

要求：

掌握數(shù)據(jù)倉庫Hive的使用；
能夠正常操作數(shù)據(jù)庫、表、數(shù)據(jù)；

實(shí)

驗(yàn)

環(huán)

境

五臺(tái)獨(dú)立PC式虛擬機(jī)；
主機(jī)之間有有效的網(wǎng)絡(luò)連接；
每臺(tái)主機(jī)內(nèi)存2G以上，磁盤50G；
所有主機(jī)上安裝CentOS7-64位操作系統(tǒng)；
所有主機(jī)已完成靜態(tài)網(wǎng)絡(luò)地址、主機(jī)名、主機(jī)地址映射的配置；
已完成Hadoop平臺(tái)的搭建；
已完成MySQL數(shù)據(jù)庫平臺(tái)的搭建；
已完成Hbase的安裝；
已完成Hive數(shù)據(jù)倉庫的安裝；

軟件版本：

???? ???選用Hive的2.1.1版本，軟件包名apache-hive-2.1.1-bin.tar.gz；

練習(xí)內(nèi)容

步驟一：Hive工具安裝配置

1、集群的啟動(dòng)；

★ 該項(xiàng)的所有操作步驟使用專門用于集群的用戶admin進(jìn)行。

★ 啟動(dòng)HBase集群之前首先確保Zookeeper集群已被開啟狀態(tài)。（實(shí)驗(yàn)5臺(tái)），Zookeeper的啟動(dòng)需要分別在每個(gè)計(jì)算機(jī)的節(jié)點(diǎn)上手動(dòng)啟動(dòng)。如果家目錄下執(zhí)行啟動(dòng)報(bào)錯(cuò)，則需要進(jìn)入zookeeper/bin目錄執(zhí)行啟動(dòng)命令。

★ 啟動(dòng)HBase集群之前首先確保Hadoop集群已被開啟狀態(tài)。（實(shí)驗(yàn)5臺(tái)）Hadoop只需要在主節(jié)點(diǎn)執(zhí)行啟動(dòng)命令。

a) 在集群中所有主機(jī)上使用命令“zkServer.sh status”查看該節(jié)點(diǎn)Zookeeper服務(wù)當(dāng)前的狀態(tài)，若集群中只有一個(gè)“l(fā)eader”節(jié)點(diǎn)，其余的均為“follower”節(jié)點(diǎn)，則集群的工作狀態(tài)正常。如果Zookeeper未啟動(dòng)，則在集群中所有主機(jī)上使用命令“zkServer.sh start”啟動(dòng)Zookeeper服務(wù)的腳本;

???? ???????????

b) 在主節(jié)點(diǎn)，查看Java進(jìn)程信息，若有名為“NameNode”、“ResourceManager”的兩個(gè)進(jìn)程，則表示Hadoop集群的主節(jié)點(diǎn)啟動(dòng)成功。在每臺(tái)數(shù)據(jù)節(jié)點(diǎn)，若有名為“DataNode”和“NodeManager”的兩個(gè)進(jìn)程，則表示Hadoop集群的數(shù)據(jù)節(jié)點(diǎn)啟動(dòng)成功, 如果不存在以上三個(gè)進(jìn)程，則在主節(jié)點(diǎn)使用此命令,啟動(dòng)Hadoop集群。

主節(jié)點(diǎn)及備用主節(jié)點(diǎn)：

通信節(jié)點(diǎn)：

c) 確定Hadoop集群已啟動(dòng)狀態(tài)，然后在主節(jié)點(diǎn)使用此命令,啟動(dòng)HBase集群, 在集群中所有主機(jī)上使用命令“jps”;

2、在主節(jié)點(diǎn)使用命令“hive”啟動(dòng)Hive，啟動(dòng)成功后能夠進(jìn)入Hive的控制臺(tái)。

3、在控制臺(tái)中使用命令“show databases;”查看當(dāng)前的數(shù)據(jù)庫列表。

練習(xí)：

1、啟動(dòng)Hive，Hive常用命令；

?? 命令：

??????? $hive???????? ???#啟動(dòng)Hive，啟動(dòng)成功后能夠進(jìn)入Hive的控制臺(tái)

??????? >show databases; #查看當(dāng)前的數(shù)據(jù)庫列表

??????? >create database test1;??? #創(chuàng)建數(shù)據(jù)庫

??????? >show databases;

??????? >use test1;??????????????? #使用數(shù)據(jù)庫

? ? ? ? >create table testable(id int,name string,age int,tel string)row format delimited fields terminated by’,’stored as textfile;

>show tables;

>drop table testable;?????? #刪除表

>drop database test1;?????? #刪除數(shù)據(jù)庫

2、Hive的數(shù)據(jù)模型_內(nèi)部表

?? -與數(shù)據(jù)庫中的Table在概念上是類似的。

?? -每一個(gè)Table在Hive中都有一個(gè)相應(yīng)的目錄存儲(chǔ)數(shù)據(jù)。

?? -所有的Table數(shù)據(jù)（不包括External Table）都保存在這個(gè)目錄中。

練習(xí)：

??????? 命令：

$hive

>create database test2;

>use test2;

>create database test3;

>use test3;

>create table t1(tid int, tname string, age int);

>create table t2(tid int, tname string, age int) location '/mytable/hive/t2';

>create table t3(tid int, tname string, age int) row format delimited fields terminated by ';

>create table t4 as select * from t1;

$ hadoop fs -ls /user/hive/warehouse/

$ hadoop fs -ls /user/hive/warehouse/test2.db

$ hadoop fs -ls /user/hive/warehouse/test2.db/t1

$ hadoop fs -ls /mytable/hive/

>desc t1;

>alter table tl add columns(english int);

>desc t1;

>drop table t1;

$hdfs dfs -ls /user/hive/warehouse/test2db

3、Hive的數(shù)據(jù)模型_分區(qū)表

?? 命令：

$hive

>create database test4;

>use test4;

?? a)準(zhǔn)備數(shù)據(jù)表；

>create table sampledata (sid int, sname string, gender string, language int,math int, english int) row format delimited fields terminated by,' stored astextfile;

?? b)準(zhǔn)備文件數(shù)據(jù)；

???? 在admin用戶家目錄下新建sampledata.txt內(nèi)容：

1,Tom,M,60,80,96

2,Mary,F,ll,22,33

3,Jerry,M,90,11,23

4,Rose,M,78,77,76

5,Mike,F,99,98,98

???

?? c)將文本數(shù)據(jù)插入到數(shù)據(jù)表；

>load data local inpath ‘/home/admin/sampledata.txt’into table sampledata;

>select * from sampledata;

??? ??????-partition對應(yīng)于數(shù)據(jù)庫中的Partition?列的密集索引
-在Hive中,?表中的一個(gè)Partition對應(yīng)于表下的一個(gè)目錄,?所有的Partition的數(shù)據(jù)都存儲(chǔ)在對應(yīng)的目錄中。

???? ??

d)創(chuàng)建分區(qū)表;

命令：
>create?table?partition?_table(sid?int,sname?string)partitioned?by(gender
string)row?format?delimited?fields?terminated?by',;
>?select*from?partition_table;

e)向分區(qū)表中插入數(shù)據(jù);

命令：
>?insert into table partition table partition(gender='M')select sid,sname from sampledata where gender='M';
>?insert into table partition table partition(gender='F') select sid, sname from sampledata where gender='F';

>?select*from partition table;

>?show partitions partition table;?#查看表的分區(qū)信息

注：select查詢中會(huì)掃描整個(gè)內(nèi)容，會(huì)消耗大量時(shí)間。由于相當(dāng)多的時(shí)候人們只關(guān)心表中的一部分?jǐn)?shù)據(jù)，故建表時(shí)引入了區(qū)分概念。

登錄http://192.168.10.111:8088/cluster/apps可以查看job執(zhí)行狀態(tài)；

4、Hive的數(shù)據(jù)模型_外部表

外部表(External Table)

-指向已經(jīng)在HDFS中存在的數(shù)據(jù), 可以創(chuàng)建Partition

-它和內(nèi)部表在元數(shù)據(jù)的組織上是相同的, 而實(shí)際數(shù)據(jù)的存儲(chǔ)則有較大的差異。

-外部表只有一個(gè)過程, 加載數(shù)據(jù)和創(chuàng)建表同時(shí)完成, 并不會(huì)移動(dòng)到數(shù)據(jù)倉庫目錄中, 只是與外部數(shù)據(jù)建立一個(gè)鏈接。當(dāng)刪除一個(gè)外部表時(shí),僅刪除該鏈接。

準(zhǔn)備幾張相同數(shù)據(jù)結(jié)構(gòu)的數(shù)據(jù)txt文件, 放在HDFS的/input 目錄下。
在hive下創(chuàng)建一張有相同數(shù)據(jù)結(jié)構(gòu)的外部表external student,location設(shè)置為HDFS的/input 目錄。則external_student會(huì)自動(dòng)關(guān)連/input下的文件。
查詢外部表。
刪除/input目錄下的部分文件。
查詢外部表。刪除的那部分文件數(shù)據(jù)不存在。
將刪除的文件放入/input目錄。
查詢外部表。放入的那部分文件數(shù)據(jù)重現(xiàn)。

(1)準(zhǔn)備數(shù)據(jù):

在admin家目錄下分別新建studentl.txt內(nèi)容:

1.Tom,M,60,80,96

2,Mary,F,11,22,33

student2.txt內(nèi)容:

3,Jerry,M,90,11,23

student3.txt內(nèi)容:

4,Rose,M,78,77,76

5,Mike,F,99,98,98

Shdfs dfs-ls/

$ hdfs dfs-mkdir /input

將文件放入HDFS文件系統(tǒng)

語法:

$hdfs dfs-put localFileName hdfsFileDir

$hdfs dfs-put studentl.txt /input

$hdfs dfs -put student2.txt /input

$hdfs dfs -put student3.txt/input

$hive

>create database test5;

>use test5;

(2)創(chuàng)建外部表

> create table external_student(sid int,sname string,gender string,language int,math int,english int)row format delimited fields terminatedby''location'/input';

??? ??

(3)查詢外部表

>select*from external_student;

(4)刪除HDFS上的student1.txt

$ hdfs dfs-rm /input/studentl.txt

(5)查詢外部表

>select*from external_student;

(6)將student1.txt 重新放入HDFS input目錄下

$ hdfs dfs-put student1.txt/input

? ?????????

(7)查詢外部表

>select*from external_student;

5、Hive的數(shù)據(jù)模型_桶表

?? 命令：

$hive

>create database test6;

>use test6;

>create table users (sid int,sname string,age int)row delimited fields terminated by’,’;

準(zhǔn)備文本數(shù)據(jù):

在admin用戶家目錄下新建users.txt內(nèi)容:

1,Bear,18

2.Cherry,23

3.Lucky,33

4,Dino,26

5,Janel,28

命令：

hive> load data local inpath'/home/admin/users.txt'into table users;

hive> create table bucket_table(sid int,sname string,age int)clustered by

(sname)into 5 buckets row format delimited fields terminated by ',;

hive>insert overwrite table bucket_table SELECT*FROM users;

hive> select*from bucket_table;

$hadoop fs-ls /user/hive/warehouse/test6.db/bucket_table/

6、Hive的數(shù)據(jù)模型_視圖

語法：

創(chuàng)建視圖

Create view viewName as select data from table where condition;

查看視圖結(jié)構(gòu)

Desc viewName;

查詢視圖

Select * from viewName;

刪除視圖

DROP VIEW [IF EXISTS]view_name

命令：

???? $hive

???? >create database test7;

???? >use test7;

a)創(chuàng)建一個(gè)測試表：

hive> create table testO1(id int,name string)row format delimited fields terminated by';

hive> desc test01;

$ vi datal.txt

1,tom

2.jack

hive>load data local inpath'/home/admin/datal.txt'overwrite into table test01;

hive> select*from test01;

b)創(chuàng)建一個(gè)View之前，使用explain命令查看創(chuàng)建View的命令是如何被Hive解釋執(zhí)行的；

hive>explain create view test_view(id,name_length)as select id,length(name)from test01;

hive>explain create view test_view (id,name_length)as select id,length(name)from test;

c)實(shí)際創(chuàng)建一個(gè)View

hive>create view test_view(id,name_length)as select id,length(name)from test01;

d)執(zhí)行View之前，先explain查看實(shí)際被翻譯后的執(zhí)行過程；

hive>explain select sum(name_length)from test_view;

hive>explain extended select sum(name_length)from test_view;

e)最后，對View執(zhí)行一次查詢，顯示Stage-1階段對原始表test進(jìn)行了MapReduce過程；

hive>select sum(name_length)from test_view;

出現(xiàn)的問題與解決方案

錯(cuò)誤1、啟動(dòng)hive ： ls: cannot access/home/hadoop/spark-2.2.0-bin-hadoop2.6/lib/spark-assembly-*.jar: No such fileor directory問題

sxc@master ~]$ hivels: cannot access /software/spark/spark-2.2.0-bin-hadoop2.7/lib/spark-assembly-*.jar: No such file or directory17/11/27 13:12:56 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not existLogging initialized using configuration in jar:file:/software/hive/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties

原因：spark升級到spark2以后，原有l(wèi)ib目錄下的大JAR包被分散成多個(gè)小JAR包，原來的spark-assembly-*.jar已經(jīng)不存在，所以hive沒有辦法找到這個(gè)JAR包。

解決方法：

打開hive的安裝目錄下的bin目錄，找到hive文件

找到如下的位置

??? ?

# add Spark assembly jar to the classpathif [[ -n "$SPARK_HOME" ]]thensparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"fi

原因：

spark升級到spark2以后，原有l(wèi)ib目錄下的大JAR包被分散成多個(gè)小JAR包，原來的spark-assembly-*.jar已經(jīng)不存在，所以hive沒有辦法找到這個(gè)JAR包。

解決辦法：把紅色部分改為如下的樣子就可以了

? # add Spark assembly jar to the classpathif [[ -n "$SPARK_HOME" ]]thensparkAssemblyPath=`ls ${SPARK_HOME}/jars/*.jar`CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"fi

知識拓展

HIVE和HBASE區(qū)別

1. 兩者分別是什么？??

?Apache Hive是一個(gè)構(gòu)建在Hadoop基礎(chǔ)設(shè)施之上的數(shù)據(jù)倉庫。通過Hive可以使用HQL語言查詢存放在HDFS上的數(shù)據(jù)。HQL是一種類SQL語言，這種語言最終被轉(zhuǎn)化為Map/Reduce. 雖然Hive提供了SQL查詢功能，但是Hive不能夠進(jìn)行交互查詢--因?yàn)樗荒軌蛟贖aoop上批量的執(zhí)行Hadoop。

?Apache HBase是一種Key/Value系統(tǒng)，它運(yùn)行在HDFS之上。和Hive不一樣，Hbase的能夠在它的數(shù)據(jù)庫上實(shí)時(shí)運(yùn)行，而不是運(yùn)行MapReduce任務(wù)。Hive被分區(qū)為表格，表格又被進(jìn)一步分割為列簇。列簇必須使用schema定義，列簇將某一類型列集合起來（列不要求schema定義）。例如，“message”列簇可能包含：“to”, ”from” “date”, “subject”, 和”body”. 每一個(gè) key/value對在Hbase中被定義為一個(gè)cell，每一個(gè)key由row-key，列簇、列和時(shí)間戳。在Hbase中，行是key/value映射的集合，這個(gè)映射通過row-key來唯一標(biāo)識。Hbase利用Hadoop的基礎(chǔ)設(shè)施，可以利用通用的設(shè)備進(jìn)行水平的擴(kuò)展。

2. 兩者的特點(diǎn)：

? Hive幫助熟悉SQL的人運(yùn)行MapReduce任務(wù)。因?yàn)樗荍DBC兼容的，同時(shí)，它也能夠和現(xiàn)存的SQL工具整合在一起。運(yùn)行Hive查詢會(huì)花費(fèi)很長時(shí)間，因?yàn)樗鼤?huì)默認(rèn)遍歷表中所有的數(shù)據(jù)。雖然有這樣的缺點(diǎn)，一次遍歷的數(shù)據(jù)量可以通過Hive的分區(qū)機(jī)制來控制。分區(qū)允許在數(shù)據(jù)集上運(yùn)行過濾查詢，這些數(shù)據(jù)集存儲(chǔ)在不同的文件夾內(nèi)，查詢的時(shí)候只遍歷指定文件夾（分區(qū)）中的數(shù)據(jù)。這種機(jī)制可以用來，例如，只處理在某一個(gè)時(shí)間范圍內(nèi)的文件，只要這些文件名中包括了時(shí)間格式。

?HBase通過存儲(chǔ)key/value來工作。它支持四種主要的操作：增加或者更新行，查看一個(gè)范圍內(nèi)的cell，獲取指定的行，刪除指定的行、列或者是列的版本。版本信息用來獲取歷史數(shù)據(jù)（每一行的歷史數(shù)據(jù)可以被刪除，然后通過Hbase compactions就可以釋放出空間）。雖然HBase包括表格，但是schema僅僅被表格和列簇所要求，列不需要schema。Hbase的表格包括增加/計(jì)數(shù)功能。

3. 限制

? Hive目前不支持更新操作。另外，由于hive在hadoop上運(yùn)行批量操作，它需要花費(fèi)很長的時(shí)間，通常是幾分鐘到幾個(gè)小時(shí)才可以獲取到查詢的結(jié)果。Hive必須提供預(yù)先定義好的schema將文件和目錄映射到列，并且Hive與ACID不兼容。

? HBase查詢是通過特定的語言來編寫的，這種語言需要重新學(xué)習(xí)。類SQL的功能可以通過Apache Phonenix實(shí)現(xiàn)，但這是以必須提供schema為代價(jià)的。另外，Hbase也并不是兼容所有的ACID特性，雖然它支持某些特性。最后但不是最重要的--為了運(yùn)行Hbase，Zookeeper是必須的，zookeeper是一個(gè)用來進(jìn)行分布式協(xié)調(diào)的服務(wù)，這些服務(wù)包括配置服務(wù)，維護(hù)元信息和命名空間服務(wù)。

4. 應(yīng)用場景

? Hive適合用來對一段時(shí)間內(nèi)的數(shù)據(jù)進(jìn)行分析查詢，例如，用來計(jì)算趨勢或者網(wǎng)站的日志。Hive不應(yīng)該用來進(jìn)行實(shí)時(shí)的查詢。因?yàn)樗枰荛L時(shí)間才可以返回結(jié)果。

? Hbase非常適合用來進(jìn)行大數(shù)據(jù)的實(shí)時(shí)查詢。Facebook用Hbase進(jìn)行消息和實(shí)時(shí)的分析。它也可以用來統(tǒng)計(jì)Facebook的連接數(shù)。

5. 總結(jié)

? Hive和Hbase是兩種基于Hadoop的不同技術(shù)--Hive是一種類SQL的引擎，并且運(yùn)行MapReduce任務(wù)，Hbase是一種在Hadoop之上的NoSQL 的Key/vale數(shù)據(jù)庫。當(dāng)然，這兩種工具是可以同時(shí)使用的。就像用Google來搜索，用FaceBook進(jìn)行社交一樣，Hive可以用來進(jìn)行統(tǒng)計(jì)查詢，HBase可以用來進(jìn)行實(shí)時(shí)查詢，數(shù)據(jù)也可以從Hive寫到Hbase，設(shè)置再從Hbase寫回Hive。

?

Hive環(huán)境安裝之瀏覽器訪問配置

1、下載hive-2.1.1-src.tar.gz

然后進(jìn)入目錄${HIVE_SRC_HOME}/hwi/web，執(zhí)行打包命令:

#jar -cvf hive-hwi-1.2.2.war *

2、得到hive-hwi-1.2.2.war文件，復(fù)制到hive下的lib目錄中；

3、修改hive的配置文件hive-site.xml；

4、啟動(dòng)Hive的web；

?????? 命令：$hive –service hwi

5、通過web方式管理Hive；

??

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

Hadoop集群的基本操作（四：Hive的基本操作）

練習(xí)內(nèi)容

步驟一：Hive工具安裝配置

練習(xí)：

出現(xiàn)的問題與解決方案

知識拓展

HIVE和HBASE區(qū)別

?

Hive環(huán)境安裝之瀏覽器訪問配置

??

總結(jié)