Log files in massively distributed systems

旧巷老猫 提交于 2019-12-12 10:33:39

问题


I do a lot of work in the grid and HPC space and one of the biggest challenges we have with a system distributed across hundreds (or in some case thousands) of servers is analysing the log files.

Currently log files are written locally to the disk on each blade but we could also consider publishing logging information using for example a UDP Appender and collect it centally.

Given that the objective is to be able to identify problems in as close to real time as possible, what should we do?


回答1:


First, synchronize all clocks in the system using NTP.

Second, if you are collecting the logs in a single location (like the UDP appender you mention) make sure the logs have enough information to actually help. I would include at least the server that generated the log, the time it happened, and the message. If there is any sort of transaction id, or job id type concept, include that also.

Since you mentioned a UDP Appender I am guessing you are using log4j (or one of it's siblings). Log4j has an MDC class that allows extra information to be passed along through a processing thread. it can help collect some of the extra information and pass it along.




回答2:


Are you using Apache? If so you could have a look at mod_log_spread Though you may have too big an infrastructure to make it maintainable. The other option is to look at "broadcasting" or "multicasting" your log messages and having dedicated logging servers subscribing to those feeds and collating them



来源:https://stackoverflow.com/questions/35292/log-files-in-massively-distributed-systems

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!