问题
We are using HDP 2.4 and have many map reduce jobs written in various ways ( java MR / Hive / etc. ) . The logs are collect in hadoop file system under the application ID. I want to collect all the logs of application and append in single file (hdfs or OS files of one machine) so that I can analyze my application log in a single location with out hassle . Also advise me the best way to achieve in HDP 2.4 ( Stack version info => HDFS 2.7.1.2.4 / YARN 2.7.1.2.4 / MapReduce2 2.7.1.2.4 / Log Search 0.5.0 / Flume 1.5.2.2.4 ) .
回答1:
Flume cannot collect the logs after they are already on HDFS.
In order to do this, you need a Flume agent running on all NodeManagers pointed at the configured yarn.log.dir
, and somehow parse out the application/container/attempt/file information from the local OS file path.
I'm not sure how well collecting into a "single file" would work, as each container generates at least 5 files of different information, but YARN log aggregation already does this. It's just not in a readable file format in HDFS unless you are using Splunk/Hunk, as far as I know
Alternative solutions include indexing these files into actual search services like Solr or Elasticsearch, which I would recommend for storing and searching logs over HDFS
来源:https://stackoverflow.com/questions/49789992/hdp-2-4-how-to-collect-hadoop-mapreduce-log-using-flume-in-one-file-and-what-is