Flume实操(四)【单数据源多出口案例(选择器)】
單Source多Channel、Sink如圖所示。
1)案例需求:使用Flume-1監(jiān)控文件變動(dòng),Flume-1將變動(dòng)內(nèi)容傳遞給Flume-2,Flume-2負(fù)責(zé)存儲(chǔ)到HDFS。同時(shí)Flume-1將變動(dòng)內(nèi)容傳遞給Flume-3,Flume-3負(fù)責(zé)輸出到Local FileSystem。
2)需求分析:
0.準(zhǔn)備工作
在/opt/flume/job目錄下創(chuàng)建group1文件夾
[root@henu1 job]# mkdir group1在/opt/data/目錄下創(chuàng)建flume3文件夾
[root@henu1 opt]# mkdir data [root@henu1 opt]# cd data/ [root@henu1 data]# mkdir flume31.創(chuàng)建flume-file-flume.conf
配置1個(gè)接收日志文件的source和兩個(gè)channel、兩個(gè)sink,分別輸送給flume-flume-hdfs和flume-flume-dir。
創(chuàng)建配置文件并打開(kāi)
touch flume-file-flume.conf vi flume-file-flume.conf # Name the components on this agent a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 # 將數(shù)據(jù)流復(fù)制給多個(gè)channel a1.sources.r1.selector.type = replicating# Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /opt/hive/logs/hive.log a1.sources.r1.shell = /bin/bash -c# Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.hostname = henu2 a1.sinks.k1.port = 4141a1.sinks.k2.type = avro a1.sinks.k2.hostname = henu2 a1.sinks.k2.port = 4142# Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100a1.channels.c2.type = memory a1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100# Bind the source and sink to the channel a1.sources.r1.channels = c1 c2 a1.sinks.k1.channel = c1 a1.sinks.k2.channel = c2注:Avro是由Hadoop創(chuàng)始人Doug Cutting創(chuàng)建的一種語(yǔ)言無(wú)關(guān)的數(shù)據(jù)序列化和RPC框架。
注:RPC(Remote Procedure Call)—遠(yuǎn)程過(guò)程調(diào)用,它是一種通過(guò)網(wǎng)絡(luò)從遠(yuǎn)程計(jì)算機(jī)程序上請(qǐng)求服務(wù),而不需要了解底層網(wǎng)絡(luò)技術(shù)的協(xié)議。
2.創(chuàng)建flume-flume-hdfs.conf
配置上級(jí)Flume輸出的Source,輸出是到HDFS的Sink。
創(chuàng)建配置文件并打開(kāi)
touch flume-flume-hdfs.conf vi flume-flume-hdfs.conf # Name the components on this agent a2.sources = r1 a2.sinks = k1 a2.channels = c1# Describe/configure the source a2.sources.r1.type = avro a2.sources.r1.bind = henu2 a2.sources.r1.port = 4141# Describe the sink a2.sinks.k1.type = hdfs a2.sinks.k1.hdfs.path = hdfs://mycluster/flume2/%Y%m%d/%H #上傳文件的前綴 a2.sinks.k1.hdfs.filePrefix = flume2- #是否按照時(shí)間滾動(dòng)文件夾 a2.sinks.k1.hdfs.round = true #多少時(shí)間單位創(chuàng)建一個(gè)新的文件夾 a2.sinks.k1.hdfs.roundValue = 1 #重新定義時(shí)間單位 a2.sinks.k1.hdfs.roundUnit = hour #是否使用本地時(shí)間戳 a2.sinks.k1.hdfs.useLocalTimeStamp = true #積攢多少個(gè)Event才flush到HDFS一次 a2.sinks.k1.hdfs.batchSize = 100 #設(shè)置文件類型,可支持壓縮 a2.sinks.k1.hdfs.fileType = DataStream #多久生成一個(gè)新的文件 a2.sinks.k1.hdfs.rollInterval = 600 #設(shè)置每個(gè)文件的滾動(dòng)大小大概是128M a2.sinks.k1.hdfs.rollSize = 134217700 #文件的滾動(dòng)與Event數(shù)量無(wú)關(guān) a2.sinks.k1.hdfs.rollCount = 0 #最小冗余數(shù) a2.sinks.k1.hdfs.minBlockReplicas = 1# Describe the channel a2.channels.c1.type = memory a2.channels.c1.capacity = 1000 a2.channels.c1.transactionCapacity = 100# Bind the source and sink to the channel a2.sources.r1.channels = c1 a2.sinks.k1.channel = c13.創(chuàng)建flume-flume-dir.conf
配置上級(jí)Flume輸出的Source,輸出是到本地目錄的Sink。
創(chuàng)建配置文件并打開(kāi)
touch flume-flume-dir.conf vi flume-flume-dir.conf a3.sources = r1 a3.sinks = k1 a3.channels = c2# Describe/configure the source a3.sources.r1.type = avro a3.sources.r1.bind = henu2 a3.sources.r1.port = 4142# Describe the sink a3.sinks.k1.type = file_roll a3.sinks.k1.sink.directory = /opt/datas/flume3# Describe the channel a3.channels.c2.type = memory a3.channels.c2.capacity = 1000 a3.channels.c2.transactionCapacity = 100# Bind the source and sink to the channel a3.sources.r1.channels = c2 a3.sinks.k1.channel = c2提示:輸出的本地目錄必須是已經(jīng)存在的目錄,如果該目錄不存在,并不會(huì)創(chuàng)建新的目錄。
4.執(zhí)行配置文件
分別開(kāi)啟對(duì)應(yīng)配置文件:flume-flume-dir,flume-flume-hdfs,flume-file-flume。
在opt/flume/下
flume-ng agent --conf conf/ --name a3 --conf-file job/group1/flume-flume-dir.conf flume-ng agent --conf conf/ --name a2 --conf-file job/group1/flume-flume-hdfs.conf flume-ng agent --conf conf/ --name a1 --conf-file job/group1/flume-file-flume.conf5.啟動(dòng)hive
6.查看本地目錄和hdfs目錄
總結(jié)
以上是生活随笔為你收集整理的Flume实操(四)【单数据源多出口案例(选择器)】的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 大剑无锋之flume面试题【面试推荐】
- 下一篇: 每日两SQL(2),欢迎交流~