hdfs orc格式_HIVE存储格式ORC、PARQUET对比
hive有三種默認的存儲格式,TEXT、ORC、PARQUET。TEXT是默認的格式,ORC、PARQUET是列存儲格式,占用空間和查詢效率是不同的,專門測試過后記錄一下。
一:建表語句差別
create table if not exists text(
a bigint
) partitioned by (dt string)
row format delimited fields terminated by '\001'
location '/hdfs/text/';
create table if not exists orc(
a bigint)
partitioned by (dt string)
row format delimited fields terminated by '\001'
stored as orc
location '/hdfs/orc/';
create table if not exists parquet(
a bigint)
partitioned by (dt string)
row format delimited fields terminated by '\001'
stored as parquet
location '/hdfs/parquet/';
其實就是stored as 后面跟的不一樣
二:HDFS存儲對比
parquet
orc
text
709M
275M
1G
687M
249M
1G
647M
265M
1G
三:查詢時間對比
parquet
orc
text
36.451
26.133
42.574
38.425
29.353
41.673
36.647
27.825
43.938
四:文件如何生成
val sparkSession = SparkSession.builder().master("local").appName("pushFunnelV3").getOrCreate()
val javasc = new JavaSparkContext(sparkSession.sparkContext)
val nameRDD = javasc.parallelize(util.Arrays.asList("{'name':'zhangsan','age':'18'}", "{'name':'lisi','age':'19'}")).rdd;
sparkSession.read.json(nameRDD).write.mode(SaveMode.Overwrite).csv("/data/aa")
sparkSession.read.json(nameRDD).write.mode(SaveMode.Overwrite).orc("/data/bb")
sparkSession.read.json(nameRDD).write.mode(SaveMode.Overwrite).parquet("/data/cc")
總結
以上是生活随笔為你收集整理的hdfs orc格式_HIVE存储格式ORC、PARQUET对比的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: html中加艺术字体,CSS实现漂亮的大
- 下一篇: linux 一键美化,CentOS7一键