當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Flume实操（二）【实时读取本地文件到HDFS案例】

發(fā)布時間：2024/2/28 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了 Flume实操（二）【实时读取本地文件到HDFS案例】小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

1）案例需求：實時監(jiān)控本地一文件，并上傳到HDFS中

2）需求分析：【圖示為Hive啟動日志，這里做一個更為普通的本地文件】

3）實現(xiàn)步驟：

1.Flume要想將數(shù)據輸出到HDFS，必須持有Hadoop相關jar包

Cd /opt/hadoop-2.6.5 find ./ -name "commons-configuration-1.6.jar" cp./share/hadoop/common/lib/commons-configuration-1.6.jar ../flume/lib/ find ./ -name "hadoop-auth-2.6.5.jar" cp./share/hadoop/tools/lib/hadoop-auth-2.6.5.jar find ./ -name "hadoop-common-2.6.5.jar" cp./share/hadoop/common/hadoop-common-2.6.5.jar ../flume/lib/ find ./ -name "hadoop-hdfs-2.6.5.jar" cp./share/hadoop/hdfs/hadoop-hdfs-2.6.5.jar ../flume/lib/ find ./ -name "commons-io-2.4.jar" cp ./share/hadoop/common/lib/commons-io-2.4.jar find ./ -name "htrace-core-*.jar" cp./share/hadoop/common/lib/htrace-core-3.0.4.jar ../flume/lib/

2.創(chuàng)建flume-file-hdfs.conf文件

還是在/opt/flume/job/下

touch flume-file-hdfs.conf

注：要想讀取Linux系統(tǒng)中的文件，就得按照Linux命令的規(guī)則執(zhí)行命令。由于Hive日志在Linux系統(tǒng)中所以讀取文件的類型選擇：exec即execute執(zhí)行的意思。表示執(zhí)行Linux命令來讀取文件。

vim flume-file-hdfs.conf # Name the components on this agent a2.sources = r2 a2.sinks = k2 a2.channels = c2# Describe/configure the source a2.sources.r2.type = exec a2.sources.r2.command = tail -F /opt/logs/xx.log a2.sources.r2.shell = /bin/bash -c# Describe the sink a2.sinks.k2.type = hdfs a2.sinks.k2.hdfs.path = hdfs://mycluster/flume/%Y%m%d/%H #上傳文件的前綴 a2.sinks.k2.hdfs.filePrefix = logs- #是否按照時間滾動文件夾 a2.sinks.k2.hdfs.round = true #多少時間單位創(chuàng)建一個新的文件夾 a2.sinks.k2.hdfs.roundValue = 1 #重新定義時間單位 a2.sinks.k2.hdfs.roundUnit = hour #是否使用本地時間戳 a2.sinks.k2.hdfs.useLocalTimeStamp = true #積攢多少個Event才flush到HDFS一次 a2.sinks.k2.hdfs.batchSize = 1000 #設置文件類型，可支持壓縮 a2.sinks.k2.hdfs.fileType = DataStream #多久生成一個新的文件 a2.sinks.k2.hdfs.rollInterval = 600 #設置每個文件的滾動大小 a2.sinks.k2.hdfs.rollSize = 134217700 #文件的滾動與Event數(shù)量無關 a2.sinks.k2.hdfs.rollCount = 0 #最小冗余數(shù) a2.sinks.k2.hdfs.minBlockReplicas = 1# Use a channel which buffers events in memory a2.channels.c2.type = memory a2.channels.c2.capacity = 1000 a2.channels.c2.transactionCapacity = 100# Bind the source and sink to the channel a2.sources.r2.channels = c2 a2.sinks.k2.channel = c2

各參數(shù)解釋：

3. 執(zhí)行監(jiān)控配置

flume-ng agent --conf conf/ --name a2 --conf-file job/flume-file-hdfs.conf

4.往xx.log中輸入信息

5.查看hdfs文件信息

【友情補充】

此情形可以應用于hive上，在hive的外部表中我們的表數(shù)據在hdfs上，所以，點到為止。溜了溜了！！！

總結

以上是生活随笔為你收集整理的Flume实操（二）【实时读取本地文件到HDFS案例】的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Flume实操（一）【监控端口数据官方案
下一篇： Flume实操（三）【实时读取目录文件到