HDP 2.4, How to collect hadoop mapreduce log using flume in one file and what is the best practice

问题

We are using HDP 2.4 and have many map reduce jobs written in various ways ( java MR / Hive / etc. ) . The logs are collect in hadoop file system under the application ID. I want to collect all the logs of application and append in single file (hdfs or OS files of one machine) so that I can analyze my application log in a single location with out hassle . Also advise me the best way to achieve in HDP 2.4 ( Stack version info => HDFS 2.7.1.2.4 / YARN 2.7.1.2.4 / MapReduce2 2.7.1.2.4 / Log Search 0.5.0 / Flume 1.5.2.2.4 ) .

回答1:

Flume cannot collect the logs after they are already on HDFS.

In order to do this, you need a Flume agent running on all NodeManagers pointed at the configured yarn.log.dir, and somehow parse out the application/container/attempt/file information from the local OS file path.

I'm not sure how well collecting into a "single file" would work, as each container generates at least 5 files of different information, but YARN log aggregation already does this. It's just not in a readable file format in HDFS unless you are using Splunk/Hunk, as far as I know

Alternative solutions include indexing these files into actual search services like Solr or Elasticsearch, which I would recommend for storing and searching logs over HDFS

来源：https://stackoverflow.com/questions/49789992/hdp-2-4-how-to-collect-hadoop-mapreduce-log-using-flume-in-one-file-and-what-is

标签

Hadoop

logging

MapReduce

bigdata

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!