Best way to send apache-spark loggin to redis/logstash on an Amazon EMR cluster [closed]

我怕爱的太早我们不能终老 提交于 2019-12-22 04:22:47

问题


I spark-submit jobs on an Amazon EMR cluster. I'd like all spark logging to be sent to redis/logstash. What is the proper way to configure spark under EMR to do this?

  • Keep log4j: Add a bootstrap action to modify /home/hadoop/spark/conf/log4j.properties to add an appender? However, this file already contains a lot of stuff and is a symlink to hadoop conf file. I don't want to fiddle too much with that as it already contains some rootLoggers. Which appender would do best? ryantenney/log4j-redis-appender + logstash/log4j-jsonevent-layout OR pavlobaron/log4j2redis ?

  • Migrate to slf4j+logback: Exclude slf4j-log4j12 from spark-core, add log4j-over-slf4j ... and use a logback.xml with a com.cwbase.logback.RedisAppender? Looks like this will be problematic with dependencies. Will it hide log4j.rootLoggers already defined in log4j.properties?

  • Anything else I missed?

What are your thoughts on this?

Update

Looks like I can't get second option to work. Running tests is just fine but using spark-submit (with --conf spark.driver.userClassPathFirst=true) always end up with the dreaded "Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError."


回答1:


I would setup an extra daemon for that on the cluster.



来源:https://stackoverflow.com/questions/31790944/best-way-to-send-apache-spark-loggin-to-redis-logstash-on-an-amazon-emr-cluster

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!