Flume实例

实例来源：

《Hadoop+Spark大数据技术》——刘彬斌，清华大学出版社

实例一：实时测试客户端传输的数据

在Slave001中创建netcat.conf:

cd ~
vi netcat.conf

添加以下内容：

# Name the components on this agent
agent.sources = seqGenSrc
agent.sinks = loggerSink
agent.channels = memoryChannel

# Describe configure the source
agent.sources.seqGenSrc.type = netcat
agent.sources.seqGenSrc.bind = Slave001
agent.sources.seqGenSrc.port = 44444

# Describe the sink
agent.sinks.loggerSink.type = logger

# Use a channel which buffers events in memory
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 1000
agent.channels.memoryChannel.transactionCapacity = 100

# Bind the source and sink to the channel
agent.sources.seqGenSrc.channels = memoryChannel
agent.sinks.loggerSink.channel = memoryChannel

按Esc+ :（冒号）+wq保存退出。

启动netcat.conf

flume-ng agent -n agent -c conf -f /home/hadoop/netcat.conf

flume-ng agent：启动Flume的命令
-n agent：Agent的名字，与netcat.conf配置文件的Agent名字一致
-c conf：传输配置文件
-f /home/hadoop/netcat.conf：文件路径
netcat.conf成功启动后如下图所示：
netcat启动成功
在Slave002中使用Telnet传输数据
//首先安装Telnet，随后使用Telnet连接Slave001节点的44444端口

sudo yum install -y telnet-*
telnet Slave001 44444

此时实例一成功完成。
Slave002节点所有的输入，在按下回车后会同步显示在Slave001节点的命令行中。

实例二：监控本地文件夹并写入HDFS中

在Slave001中创建monitor.conf:

#声明agent
agent.sources = source1
agent.sinks = sink1
agent.channels = channel1

# 定义数据源
agent.sources.source1.type = spooldir
agent.sources.source1.spoolDir=/home/hadoop/tmpfile/logdfs
agent.sources.source1.channels = channel1
agent.sources.source1.fileHeader=false

# filter过滤器
agent.sources.source1.interceptors=i1
agent.sources.source1.interceptors.i1.type=timestamp
agent.sources.source1.deletPolicy=immediate

#定义event暂存位置，可以使内存，磁盘，数据库等
agent.channels.channel1.type = file
agent.channels.channel1.checkpointDir = /home/hadoop/tmpfile/logdfstmp/point
agent.channels.channel1.dataDirs = /home/hadoop/tmpfile/logdfstmp

# 定义数据流向hdfs
agent.sinks.sink1.type = hdfs
agent.sinks.sink1.hdfs.path=/input
agent.sinks.sink1.hdfs.fileType=DataStream
agent.sinks.sink1.hdfs.writeFormat=TEXT
agent.sinks.sink1.hdfs.rollInterval=1
agent.sinks.sink1.hdfs.filePrefix=%Y-%m-%d
agent.sinks.sink1.channel = channel1

为确保指令成功完成，按照指令需求，创建文件目录：~/tmpfile/logdfs

mkdir/~/tmpfile/logdfs

启动monitor.conf：

flume-ng agent -n agent -c conf -f /home/hadoop/software/apache-flume-1.7.0-bin/example/monitor.conf

实例二成功完成。
开启 monitor.conf后，在mkdir/~/tmpfile/logdfs文件夹下新建或粘贴任意文件均可在任一Slave节点下通过下列语句查询结果，检查是否已同步到HDFS系统中：

hadoop fs -cat /input/*

本实例完成结果如下：
（通过Slave002向Slave001：~/tmpfile/logdfs目录中scp一个文本文件）： scp
monitor.conf成功执行
当Slave001：~/tmpfile/logdfs目录发生变动时，即刻把新文件上传至HDFS系统中。

来源：CSDN

作者：QC.Mak

链接：https://blog.csdn.net/weixin_40669562/article/details/103656545

标签

Hadoop

flume

集群技术

HDFS

基于Hadoop集群的Flume实例二例

Flume实例

实例来源：

《Hadoop+Spark大数据技术》——刘彬斌，清华大学出版社

实例一：实时测试客户端传输的数据

实例二：监控本地文件夹并写入HDFS中