问题
we read resource usage from various users and applications from the Hadoop Resource Manager using the official REST api. Our problem is that the application history does not last long enough so that it returns -1 values for used cores, memory and containers.
We'd like to extend the duration that yarn stores the data but we don't know where to set the value.
回答1:
You should check your mapred-site.xml and look at mapreduce.jobhistory.max-age-ms
. As stated in:
https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
Job history files older than this many milliseconds will be deleted when the history cleaner runs. Defaults to 604800000 (1 week).
If you want to read resource usage, you should consider using the Job History server's Job API and Job Counters API. The RM REST APIs show instantaneous usage not cumulative usage.
https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html#Job_API
https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html#Job_Counters_API
来源:https://stackoverflow.com/questions/44781337/how-long-does-the-hadoop-resource-manager-store-the-application-information