HDP 2.4, How to collect hadoop mapreduce log using flume in one file and what is the best practice

淺唱寂寞╮ 提交于 2019-12-13 03:44:22

问题


We are using HDP 2.4 and have many map reduce jobs written in various ways ( java MR / Hive / etc. ) . The logs are collect in hadoop file system under the application ID. I want to collect all the logs of application and append in single file (hdfs or OS files of one machine) so that I can analyze my application log in a single location with out hassle . Also advise me the best way to achieve in HDP 2.4 ( Stack version info => HDFS 2.7.1.2.4 / YARN 2.7.1.2.4 / MapReduce2 2.7.1.2.4 / Log Search 0.5.0 / Flume 1.5.2.2.4 ) .


回答1:


Flume cannot collect the logs after they are already on HDFS.

In order to do this, you need a Flume agent running on all NodeManagers pointed at the configured yarn.log.dir, and somehow parse out the application/container/attempt/file information from the local OS file path.

I'm not sure how well collecting into a "single file" would work, as each container generates at least 5 files of different information, but YARN log aggregation already does this. It's just not in a readable file format in HDFS unless you are using Splunk/Hunk, as far as I know

Alternative solutions include indexing these files into actual search services like Solr or Elasticsearch, which I would recommend for storing and searching logs over HDFS



来源:https://stackoverflow.com/questions/49789992/hdp-2-4-how-to-collect-hadoop-mapreduce-log-using-flume-in-one-file-and-what-is

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!