hive mysql类型,(二)Hive数据类型、数据定义、数据操作和查询
1.數(shù)據(jù)類型
1.1 基本數(shù)據(jù)類型Hive數(shù)據(jù)類型長度例子TINYINT1byte有符號(hào)整數(shù)20
SMALINT2byte有符號(hào)整數(shù)20
INT4byte有符號(hào)整數(shù)20
BIGINT8byte有符號(hào)整數(shù)20
BOOLEAN布爾類型,true或者falseTRUE FALSE
FLOAT單精度浮點(diǎn)數(shù)3.14159
DOUBLE雙精度浮點(diǎn)數(shù)3.14159
STRING字符系列。可以指定字符集。可以使用單引號(hào)或者雙引號(hào)。‘now is the time’ “for all good men”
TIMESTAMP時(shí)間類型
BINARY字節(jié)數(shù)組
Hive的String類型相當(dāng)于數(shù)據(jù)庫的varchar類型,該類型是一個(gè)可變的字符串,不過它不能聲明其中最多能存儲(chǔ)多少個(gè)字符,理論上它可以存儲(chǔ)2GB的字符數(shù)。
1.2 復(fù)雜數(shù)據(jù)類型(集合數(shù)據(jù)類型)數(shù)據(jù)類型描述語法示例STRUCT和c語言中的struct類似,都可以通過“點(diǎn)”符號(hào)訪問元素內(nèi)容。例如,如果某個(gè)列的數(shù)據(jù)類型是STRUCT{first STRING, last STRING},那么第1個(gè)元素可以通過字段.first來引用。struct() 例如struct
MAPMAP是一組鍵-值對元組集合,使用數(shù)組表示法可以訪問數(shù)據(jù)。例如,如果某個(gè)列的數(shù)據(jù)類型是MAP,其中鍵->值對是’first’->’John’和’last’->’Doe’,那么可以通過字段名[‘last’]獲取最后一個(gè)元素map() 例如map
ARRAY數(shù)組是一組具有相同類型和名稱的變量的集合。這些變量稱為數(shù)組的元素,每個(gè)數(shù)組元素都有一個(gè)編號(hào),編號(hào)從零開始。例如,數(shù)組值為[‘John’, ‘Doe’],那么第2個(gè)元素可以通過數(shù)組名[1]進(jìn)行引用。Array() 例如array
ARRAY和MAP與Java中的Array和Map類似,而STRUCT與C語言中的Struct類似,它封裝了一個(gè)命名字段集合,復(fù)雜數(shù)據(jù)類型允許任意層次的嵌套。
1.3 類型轉(zhuǎn)換隱式類型轉(zhuǎn)換規(guī)則如下
1)任何整數(shù)類型都可以隱式地轉(zhuǎn)換為一個(gè)范圍更廣的類型,如TINYINT可以轉(zhuǎn)換成INT,INT可以轉(zhuǎn)換成BIGINT。
2)所有整數(shù)類型、FLOAT和STRING類型都可以隱式地轉(zhuǎn)換成DOUBLE。
3)TINYINT、SMALLINT、INT都可以轉(zhuǎn)換為FLOAT。
4)BOOLEAN類型不可以轉(zhuǎn)換為任何其它的類型。
示例0: jdbc:hive2://hadoop10:10000> select '365'+2
. . . . . . . . . . . . . . . > ;
INFO : Compiling command(queryId=v2admin_20210109163104_2d9d7430-013d-45f2-9b55-e46e1105366b): select '365'+2
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:double, comment:null)], properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109163104_2d9d7430-013d-45f2-9b55-e46e1105366b); Time taken: 2.064 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109163104_2d9d7430-013d-45f2-9b55-e46e1105366b): select '365'+2
INFO : Completed executing command(queryId=v2admin_20210109163104_2d9d7430-013d-45f2-9b55-e46e1105366b); Time taken: 0.002 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+--------+
| _c0 |
+--------+
| 367.0 |
+--------+使用CAST操作顯示進(jìn)行數(shù)據(jù)類型轉(zhuǎn)換
例如CAST('365' AS INT)將把字符串'365' 轉(zhuǎn)換成整數(shù)365;如果強(qiáng)制類型轉(zhuǎn)換失敗,如執(zhí)行CAST('X' AS INT),表達(dá)式返回空值 NULL。
示例0: jdbc:hive2://hadoop10:10000> select cast('365' as int)+2, cast('365a' as int)+2;
INFO : Compiling command(queryId=v2admin_20210109163353_5fedabc9-f213-4ca2-89e4-0a4e27598fab): select cast('365' as int)+2, cast('365a' as int)+2
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:int, comment:null), FieldSchema(name:_c1, type:int, comment:null)], properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109163353_5fedabc9-f213-4ca2-89e4-0a4e27598fab); Time taken: 0.225 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109163353_5fedabc9-f213-4ca2-89e4-0a4e27598fab): select cast('365' as int)+2, cast('365a' as int)+2
INFO : Completed executing command(queryId=v2admin_20210109163353_5fedabc9-f213-4ca2-89e4-0a4e27598fab); Time taken: 0.001 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+------+-------+
| _c0 | _c1 |
+------+-------+
| 367 | NULL |
+------+-------+
2. 數(shù)據(jù)定義
2.1 創(chuàng)建數(shù)據(jù)庫
1)創(chuàng)建數(shù)據(jù)庫mydb0: jdbc:hive2://hadoop10:10000> create database if not exits mydb;
2)創(chuàng)建一個(gè)數(shù)據(jù)庫,指定數(shù)據(jù)庫在HDFS存放的位置create database demo1_db location '/db/demo1_db';
2.2 查詢數(shù)據(jù)庫 show database;0: jdbc:hive2://hadoop10:10000> show databases;
INFO : Compiling command(queryId=v2admin_20210109164417_b436551a-5605-4367-9600-904e693b37ec): show databases
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109164417_b436551a-5605-4367-9600-904e693b37ec); Time taken: 0.014 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109164417_b436551a-5605-4367-9600-904e693b37ec): show databases
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=v2admin_20210109164417_b436551a-5605-4367-9600-904e693b37ec); Time taken: 0.012 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+----------------+
| database_name |
+----------------+
| default |
| demo_db2 |
| mydb |
+----------------+
3 rows selected (0.056 seconds)
2.3 查看數(shù)據(jù)庫詳情 desc database my_db;0: jdbc:hive2://hadoop10:10000> desc database db;
Error: Error while compiling statement: FAILED: SemanticException [Error 10072]: Database does not exist: db (state=42000,code=10072)
0: jdbc:hive2://hadoop10:10000> desc database mydb;
INFO : Compiling command(queryId=v2admin_20210109164555_ab020ad8-6e97-4a62-b7cd-0a71045f9cd9): desc database mydb
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:db_name, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer), FieldSchema(name:location, type:string, comment:from deserializer), FieldSchema(name:owner_name, type:string, comment:from deserializer), FieldSchema(name:owner_type, type:string, comment:from deserializer), FieldSchema(name:parameters, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109164555_ab020ad8-6e97-4a62-b7cd-0a71045f9cd9); Time taken: 0.038 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109164555_ab020ad8-6e97-4a62-b7cd-0a71045f9cd9): desc database mydb
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=v2admin_20210109164555_ab020ad8-6e97-4a62-b7cd-0a71045f9cd9); Time taken: 0.005 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+----------+--------------+---------------------------------------------------+-------------+-------------+-------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+----------+--------------+---------------------------------------------------+-------------+-------------+-------------+
| mydb | My first db | hdfs://hadoop10:9820/user/hive/warehouse/mydb.db | v2admin | USER | |
+----------+--------------+---------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.069 seconds)
2.4 切換當(dāng)前數(shù)據(jù)庫use mydb;
哈哈,看到這,應(yīng)該發(fā)現(xiàn)了,這東東跟我們用的sql基本沒多少區(qū)別,這下能夠感受到使用hive的一些好處了吧。
2.5 刪除數(shù)據(jù)庫
嗯,這個(gè)操作,我建議就不看了,啥時(shí)候用啥時(shí)候去往上查,避免誤刪除數(shù)據(jù)庫。
2.6 創(chuàng)建表
這個(gè)跟mysql有些區(qū)別,語法CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement]
1)CREATE TABLE 創(chuàng)建一個(gè)指定名字的表。如果相同名字的表已經(jīng)存在,則拋出異常,可以用 IF NOT EXISTS 選項(xiàng)來忽略這個(gè)異常。
2)EXTERNAL關(guān)鍵字可以讓用戶創(chuàng)建一個(gè)外部表,在建表的同時(shí)可以指定一個(gè)指向?qū)嶋H數(shù)據(jù)的路徑(LOCATION),在刪除表的時(shí)候,內(nèi)部表的元數(shù)據(jù)和數(shù)據(jù)會(huì)被一起刪除,而外部表只刪除元數(shù)據(jù),不刪除數(shù)據(jù)。
3)COMMENT:為表和列添加注釋。
4)PARTITIONED BY創(chuàng)建分區(qū)表
5)CLUSTERED BY創(chuàng)建分桶表
6)SORTED BY 對桶中的一個(gè)或多個(gè)列另外排序
7)ROW FORMAT
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
用戶在建表的時(shí)候可以自定義SerDe或者使用自帶的SerDe。如果沒有指定ROW FORMAT 或者ROW FORMAT DELIMITED,將會(huì)使用自帶的SerDe。在建表的時(shí)候,用戶還需要為表指定列,用戶在指定表的列的同時(shí)也會(huì)指定自定義的SerDe,Hive通過SerDe確定表的具體的列的數(shù)據(jù)。
SerDe是Serialize/Deserilize的簡稱, hive使用Serde進(jìn)行行對象的序列與反序列化。
8)STORED AS指定存儲(chǔ)文件類型
常用的存儲(chǔ)文件類型:SEQUENCEFILE(二進(jìn)制序列文件)、TEXTFILE(文本)、RCFILE(列式存儲(chǔ)格式文件)
如果文件數(shù)據(jù)是純文本,可以使用STORED AS TEXTFILE。如果數(shù)據(jù)需要壓縮,使用 STORED AS SEQUENCEFILE。
9)LOCATION :指定表在HDFS上的存儲(chǔ)位置。
10)AS:后跟查詢語句,根據(jù)查詢結(jié)果創(chuàng)建表。
11)LIKE允許用戶復(fù)制現(xiàn)有的表結(jié)構(gòu),但是不復(fù)制數(shù)據(jù)。
2.6.1 創(chuàng)建管理表create table if not exists stu1(
id int,
name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/user/hive/warehouse/demo01_db.db';
show tables 查看下0: jdbc:hive2://hadoop10:10000> show tables;
INFO : Compiling command(queryId=v2admin_20210109170622_440b4273-c978-4fbb-8ca4-e727cc3ecbb1): show tables
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109170622_440b4273-c978-4fbb-8ca4-e727cc3ecbb1); Time taken: 0.031 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109170622_440b4273-c978-4fbb-8ca4-e727cc3ecbb1): show tables
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=v2admin_20210109170622_440b4273-c978-4fbb-8ca4-e727cc3ecbb1); Time taken: 0.025 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+-----------+
| tab_name |
+-----------+
| stu1 |
+-----------+
看下表的詳情0: jdbc:hive2://hadoop10:10000> desc stu1;
INFO : Compiling command(queryId=v2admin_20210109170725_25f84e5c-cb26-4df7-a29f-a64c5087f76e): desc stu1
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109170725_25f84e5c-cb26-4df7-a29f-a64c5087f76e); Time taken: 0.071 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109170725_25f84e5c-cb26-4df7-a29f-a64c5087f76e): desc stu1
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=v2admin_20210109170725_25f84e5c-cb26-4df7-a29f-a64c5087f76e); Time taken: 0.017 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| id | int | |
| name | string | |
+-----------+------------+----------+
2.6.2 外部表
所謂外部表,就是Hive不能完全掌控的表,刪除外部表,并不會(huì)刪掉其數(shù)據(jù),但會(huì)刪除這張表對應(yīng)的元數(shù)據(jù)信息。
那什么時(shí)候使用外部表,什么時(shí)候使用管理表呢?
比如網(wǎng)站的日志,對于原始數(shù)據(jù),我們用外部表,做數(shù)據(jù)分析之類,而中間表、結(jié)果表就是用管理表,也就是內(nèi)部表。
示例:
1)準(zhǔn)備數(shù)據(jù)
vim stu.txt1001 lisi
1002 zhangsan
1003 wangwu
1004 zhalou
2)上傳至hdfs[v2admin@hadoop10 demo]$ hadoop fs -put stu.txt /demofile
2021-01-09 17:26:08,616 INFO [main] Configuration.deprecation (Configuration.java:logDeprecation(1395)) - No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
2021-01-09 17:26:09,162 INFO [Thread-7] sasl.SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(239)) - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
3)創(chuàng)建外部表create external table if not exists stu2(
id int,
name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/demofile';
4)查看表內(nèi)容0: jdbc:hive2://hadoop10:10000> select * from stu2;
INFO : Compiling command(queryId=v2admin_20210109172702_4db60baa-51ee-442d-b486-b5b22d981d83): select * from stu2
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:stu2.id, type:int, comment:null), FieldSchema(name:stu2.name, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109172702_4db60baa-51ee-442d-b486-b5b22d981d83); Time taken: 0.193 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109172702_4db60baa-51ee-442d-b486-b5b22d981d83): select * from stu2
INFO : Completed executing command(queryId=v2admin_20210109172702_4db60baa-51ee-442d-b486-b5b22d981d83); Time taken: 0.0 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+----------+------------+
| stu2.id | stu2.name |
+----------+------------+
| 1001 | lisi |
| 1002 | zhangsan |
| 1003 | wangwu |
| 1004 | zhalou |
+----------+------------+
5)刪除外部表0: jdbc:hive2://hadoop10:10000> drop table stu2;
INFO : Compiling command(queryId=v2admin_20210109172804_97f8538f-f5d6-4f94-a169-f0d170f57da5): drop table stu2
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109172804_97f8538f-f5d6-4f94-a169-f0d170f57da5); Time taken: 0.037 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109172804_97f8538f-f5d6-4f94-a169-f0d170f57da5): drop table stu2
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=v2admin_20210109172804_97f8538f-f5d6-4f94-a169-f0d170f57da5); Time taken: 0.245 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
No rows affected (0.327 seconds)
6)查看hdfs原始數(shù)據(jù)[v2admin@hadoop10 demo]$ hadoop fs -cat /demofile/stu.txt
2021-01-09 17:29:11,911 INFO [main] Configuration.deprecation (Configuration.java:logDeprecation(1395)) - No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
2021-01-09 17:29:12,445 INFO [main] sasl.SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(239)) - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
1001 lisi
1002 zhangsan
1003 wangwu
1004 zhalou
數(shù)據(jù)還在,但對應(yīng)的meta數(shù)據(jù)則被刪除。
2.8 修改表
2.8.1 修改表名ALTER TABLE table_name RENAME TO new_table_name
2.8.2 增加、修改、替換列
更新列語法ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]
增加和替換列語法ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
示例
查詢表結(jié)構(gòu)0: jdbc:hive2://hadoop10:10000> desc stu1;
INFO : Compiling command(queryId=v2admin_20210109173448_890d225c-c89e-4f41-aae3-a75dcd6845c5): desc stu1
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109173448_890d225c-c89e-4f41-aae3-a75dcd6845c5); Time taken: 0.036 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109173448_890d225c-c89e-4f41-aae3-a75dcd6845c5): desc stu1
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=v2admin_20210109173448_890d225c-c89e-4f41-aae3-a75dcd6845c5); Time taken: 0.016 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| id | int | |
| name | string | |
+-----------+------------+----------+
添加列0: jdbc:hive2://hadoop10:10000> alter table stu1 add columns(age int);
查詢表結(jié)構(gòu)0: jdbc:hive2://hadoop10:10000> desc stu1;
INFO : Compiling command(queryId=v2admin_20210109173722_c490c030-015b-447f-a38b-a14197c70ef4): desc stu1
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=v2admin_20210109173722_c490c030-015b-447f-a38b-a14197c70ef4); Time taken: 0.04 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=v2admin_20210109173722_c490c030-015b-447f-a38b-a14197c70ef4): desc stu1
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=v2admin_20210109173722_c490c030-015b-447f-a38b-a14197c70ef4); Time taken: 0.014 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| id | int | |
| name | string | |
| age | int | |
+-----------+------------+----------+
更新列0: jdbc:hive2://hadoop10:10000> alter table stu1 change column age desc string;
查詢表結(jié)構(gòu)0: jdbc:hive2://hadoop10:10000> desc stu1;
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| id | int | |
| name | string | |
| desc | string | |
+-----------+------------+----------+
3. 數(shù)據(jù)操作
Hive的數(shù)據(jù)操作就兩個(gè)內(nèi)容,一個(gè)導(dǎo)入,一個(gè)導(dǎo)出。
3.1 導(dǎo)入數(shù)據(jù)
3.1.1 直接向表中載入數(shù)據(jù)--load1.語法hive> load data [local] inpath '數(shù)據(jù)的path' [overwrite] into table table_name [partition (partcol1=val1,…)];屬性名描述load data加載數(shù)據(jù)
local這個(gè)表示是從本地載入到hive表,沒有的話,表示從hdfs載入到hive
inpath載入數(shù)據(jù)的路徑
overwrite覆蓋表中已有數(shù)據(jù),沒有這個(gè)關(guān)鍵字表示追加
into table加載到哪張表
table_name具體表的名字
partition上傳到指定的分區(qū)2.示例
1)創(chuàng)建一張表stu10:jdbc:hive2://hadoop10:10000> create table stu1(
. . . . . . . . . . . . . . . > id int,
. . . . . . . . . . . . . . . > name string)
. . . . . . . . . . . . . . . > row format delimited fields terminated by '\t';
2)準(zhǔn)備數(shù)據(jù)[v2admin@hadoop10 demo]$ cat stu.txt
1001 lisi
1002 zhangsan
1003 wangwu
1004 zhalou
3)上傳到hdfs一份[v2admin@hadoop10 demo]$ hadoop fs -put stu.txt /student
4)從本地加載hive的stu1表中0: jdbc:hive2://hadoop10:10000> load data local inpath '/home/v2admin/demo/stu.txt' into table demo01_db.stu1;
我們看下表里面有沒有內(nèi)容0: jdbc:hive2://hadoop10:10000> select * from stu1;
+----------+------------+
| stu1.id | stu1.name |
+----------+------------+
| 1001 | lisi |
| 1002 | zhangsan |
| 1003 | wangwu |
| 1004 | zhalou |
+----------+------------+
5)從HDFS中加載文件到hive中0: jdbc:hive2://hadoop10:10000> load data inpath '/student/stu.txt' into table demo01_db.stu1;
看下表的內(nèi)容+----------+------------+
| stu1.id | stu1.name |
+----------+------------+
| 1001 | lisi |
| 1002 | zhangsan |
| 1003 | wangwu |
| 1004 | zhalou |
| 1001 | lisi |
| 1002 | zhangsan |
| 1003 | wangwu |
| 1004 | zhalou |
+----------+------------+
3.1.2 插入數(shù)據(jù)Insertinsert into table stu1 values(1005,'aaa'),(1006,'bbb'),(1007,'ccc');
inert into 表示追加數(shù)據(jù)
如果使用insert overwrite 表示覆蓋插入
3.1.3 通過Location指定加載數(shù)據(jù)路徑,建表時(shí),可以直接導(dǎo)入數(shù)據(jù)0: jdbc:hive2://hadoop10:10000> create external table stu2(
. . . . . . . . . . . . . . . > id int,
. . . . . . . . . . . . . . . > name string)
. . . . . . . . . . . . . . . > row format delimited fields terminated by '\t'
. . . . . . . . . . . . . . . > location '/student';
我們看下表0: jdbc:hive2://hadoop10:10000> select * from stu2;
+----------+------------+
| stu2.id | stu2.name |
+----------+------------+
| 1001 | lisi |
| 1002 | zhangsan |
| 1003 | wangwu |
| 1004 | zhalou |
+----------+------------+
就是一些建表的操作,可以靈活使用
3.1.3 import
既然說是導(dǎo)入,那肯定有一個(gè)import操作,這個(gè)是需要先export導(dǎo)出,然后在進(jìn)行導(dǎo)入,見后面數(shù)據(jù)到處
3.2 數(shù)據(jù)導(dǎo)出
3.2.1 export 到處到hdfs中0: jdbc:hive2://hadoop10:10000> export table demo01_db.stu2 to '/user/hive/warehouse/export/stu2';
導(dǎo)入就是import0: jdbc:hive2://hadoop10:10000> import table stu3 from
'/user/hive/warehouse/export/stu2';
兩者搭配使用,主要用于兩個(gè)Hadoop平臺(tái)集群之間Hive表遷移。
3.2.2 Insert導(dǎo)出
Insert也能導(dǎo)出數(shù)據(jù)?還真可以,我印象最開始看到這個(gè)也很蒙,記下來就行。
示例把查詢的結(jié)果導(dǎo)出到本地0: jdbc:hive2://hadoop10:10000> insert overwrite local directory '/home/v2admin/demo/tmp'
. . . . . . . . . . . . . . . > select * from stu1;
不加local就是導(dǎo)出到hdfs上,我們看下導(dǎo)出的文件1001^Alisi
1002^Azhangsan
1003^Awangwu
1004^Azhalou
1001^Alisi
1002^Azhangsan
1003^Awangwu
1004^Azhalou
1005^Aaaa
1006^Abbb
1007^Accc
跟我們想的不一樣,這個(gè)可以格式化后導(dǎo)出,示例如下insert overwrite local directory '/home/v2admin/demo/tmp'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from stu1;
再看下文件1001 lisi
1002 zhangsan
1003 wangwu
1004 zhalou
1001 lisi
1002 zhangsan
1003 wangwu
1004 zhalou
1005 aaa
1006 bbb
1007 ccc
是我們想要的格式了。
3.2.3 通過Hive SHell命令導(dǎo)出[v2admin@hadoop10 tmp]$ hive -e 'select * from demo01_db.stu2;' > stu2.txt;
導(dǎo)入導(dǎo)出方式多種,靈活使用即可。
4 查詢
基本上和我們sql等同,區(qū)別不大,比如
1)全表查詢select * from stu1;
2)查詢指定列select id,name from stu1;
3)列別名select id as stu_num, name as stu_name from stu1;
4) where 語句selec * from sut1 where id=1001;
5)limit語句select * from stu1 limit 3;
分組查詢group by的使用 having的使用,多表查詢join都和sql差異不大。
所以hive很容易上手,極大程度上減少了我們的學(xué)習(xí)成本。
總結(jié)
以上是生活随笔為你收集整理的hive mysql类型,(二)Hive数据类型、数据定义、数据操作和查询的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: jquery在thymeleaf循环的按
- 下一篇: mysql 可以用多个索引_mysql索