flume

日志消费 Flume 启动停止脚本

删除回忆录丶 提交于 2019-12-06 14:32:40
1)在/home/hadoop/shell 目录下创建脚本 f2.sh [hadoop@elk01 shell]$ vim f2.sh 在脚本中填写如下内容 #! /bin/bash case $1 in "start"){ for i in elk-03 do echo " --------启动 $i 消费 flume-------" ssh $i "nohup /bd/flume-1.7/bin/flume-ng agent --conf-file /bd/flume-1.7/conf/kafka-flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/bd/flume-1.7/log.txt 2>&1 &" done };; "stop"){ for i in elk-03 do echo " --------停止 $i 消费 flume-------" ssh $i "ps -ef | grep kafka-flume-hdfs | grep -v grep |awk '{print \$2}' | xargs kill" done };; esac v> 2)增加脚本执行权限 [hadoop@elk01 shell]$ chmod u+x f2.sh 3)f2 集群启动脚本 [hadoop@elk01

flume部署

我只是一个虾纸丫 提交于 2019-12-06 14:04:29
参考: 笔记 https://www.cnblogs.com/yinzhengjie/p/11183988.html 官网: http://flume.apache.org/documentation.html user guide: https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst 来源: https://www.cnblogs.com/hongfeng2019/p/11988507.html

java.io.IOException: Cannot obtain block length for LocatedBlock

岁酱吖の 提交于 2019-12-06 13:35:09
I am using HDP 2.1. for the cluster. I've encountered below exception and the MapReduce jobs have been failed because of that. Actually, we regularly create tables using the data from Flume which is ver. 1.4. and I checked the data files which mapper tried to read but I couldn't find anything on that. 2014-11-28 00:08:28,696 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties 2014-11-28 00:08:28,947 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at

Flume configuration to upload files with same name

左心房为你撑大大i 提交于 2019-12-06 10:06:18
I have 10 files with some data varying in length.I would like to store corresponding data in same file and with same filename, but flume is splitting up the data and saving as FlumeData.timestamp. I am using the configuration as below: a1.sources = r1 a1.sinks = k2 a1.channels = c1 a1.channels.c1.type = file a1.channels.c1.checkpointDir = /mnt/flume/checkpoint a1.channels.c1.dataDirs = /mnt/flume/data a1.channels.c1.trackerDir = /mnt/flume/track a1.channels.c1.transactionCapacity = 10000000 a1.channels.c1.capacity = 500000000 a1.channels.c1.maxFileSize = 10000000 a1.channels.c1

ELK日志套件安装与使用

佐手、 提交于 2019-12-05 23:25:58
ELK日志套件安装与使用 1 、 ELK 介绍 ELK 不是一款软件,而是elasticsearch + Logstash+ kibana 三款开源软件组合而成的日志收集处理套件,堪称神器。其中Logstash负责日志收集,elasticsearch负责日志的搜索、统计,而 kibana 则是 ES 的展示神器,前端炫丽,点几下鼠标简单配置,就可以完成搜索、聚合功能,生成华丽的报表。 目前我们的日志方案: flume负责收集,服务写日志到文件,flume收集日志文件 flume汇总到数据通道kafka,供其他服务消费 日志搜索:从kafka读取日志写入到solr cloud提供搜索 日志统计:将kafka的日志写到hdfs,使用spark、hive来做统计 日志展示:开发的java-web,读取数据库生成统计报表 当前日志方案问题分析: 需要预先编程才能使用,开发工作量大 不够灵活,新需求需要改代码 离线统计,实时性不高 未提供基于搜索结果的统计功能 系统方案较为复杂,需要了解多项技术,学习维护成本高 …… 新增需求都是泪,开发出来后变动很少 通过调研 ELK ,发现真是解救目前问题的一个神器,大部分问题都可以迎刃而解。 2 、 ELK 安装 默认需要先安装jdk1.8,自行安装即可 安装教程: https://www.cnblogs.com/jxd283465/p

Reading Flume spoolDir in parallel

爱⌒轻易说出口 提交于 2019-12-05 20:17:43
Since I'm not allowed to set up Flume on prod servers, I have to download the logs, put them in a Flume spoolDir and have a sink to consume from the channel and write to Cassandra. Everything is working fine. However, as I have a lot of log files in the spoolDir, and the current setup is only processing 1 file at a time, it's taking a while. I want to be able to process many files concurrently. One way I thought of is to use the spoolDir but distribute the files into 5-10 different directories, and define multiple sources/channels/sinks, but this is a bit clumsy. Is there a better way to

Can Apache Sqoop and Flume be used interchangeably?

梦想的初衷 提交于 2019-12-05 18:44:36
I am new to Big data. From some of the answers to What's the difference between Flume and Sqoop? , both Flume and Sqoop can pull data from source and push to Hadoop. Can anyone please specify exaclty where flume is used and where sqoop is? Can both be used for the same tasks? Flume and Sqoop are both designed to work with different kind of data sources. Sqoop works with any kind of RDBMS system that supports JDBC connectivity. Flume on the other hand works well with streaming data sources like log data which is being generated continuously in your environment. Specifically, Sqoop could be used

Invalid hostname error when connecting to s3 sink when using secret key having forward slash

牧云@^-^@ 提交于 2019-12-05 14:21:34
I have a forward slash in aws secret key. When I try to connect to s3 sink Caused by: java.lang.IllegalArgumentException: Invalid hostname in URI s3://xxxx:xxxx@jelogs/je.1359961366545 at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:41) When I encode forward slash with %2F , I get The request signature we calculated does not match the signature you provided. Check your key and signing method. How should I encode my secret key. samthebest solution works, you just have to add "" surrounding the keys. Here how to use it: hadoop distcp -Dfs.s3a.awsAccessKeyId="yourkey" -Dfs

阿里云ECS服务器部署HADOOP集群(一):Hadoop完全分布式集群环境搭建

怎甘沉沦 提交于 2019-12-05 08:43:18
本篇将在 阿里云ECS服务器部署HADOOP集群(一):Hadoop完全分布式集群环境搭建 的基础上搭建。 1 环境介绍 一台阿里云ECS服务器:master 操作系统: CentOS 7.3 Hadoop: hadoop-2.7.3.tar.gz Java: jdk-8u77-linux-x64.tar.gz Flume: apache-flume-1.8.0-bin.tar.gz 2 Flume 下载 下载 apache-flume-1.8.0-bin.tar.gz 并在合适的位置解压缩,笔者这里解压缩的路径为: /usr/local 将解压得到的目录改名为 flume 1 cd /usr/local 2 mv apache-flume-1.8.0-bin/ flume/ 3 添加 Flume 环境变量 在"/etc/profile"中添加内容: 1 export FLUME_HOME=/usr/local/flume 2 export PATH=$PATH:$FLUME_HOME/bin 重新加载环境: source /etc/profile 4 修改 Flume 环境变量 1 cd /usr/local/flume/conf 2 cp ./flume-env.sh.template ./flume-env.sh 3 vim ./flume-env.sh 添加内容:

Getting 'checking flume.conf for changes' in a loop

可紊 提交于 2019-12-05 07:15:46
问题 I am using Apache Flume 1.4.0 to collect log files (auth.log) and store in HDFS (Hadoop 2.6.0). The command used is: bin/flume-ng agent --conf ./conf/ -f flume.conf -Dflume.root.logger=DEBUG,console -n agent The flume.conf file contains the following: agent.channels.memory-channel.type = memory agent.sources.tail-source.type = exec agent.sources.tail-source.command = tail -F /var/log/auth.log agent.sources.tail-source.channels = memory-channel agent.sinks.log-sink.channel = memory-channel