I have a Scala Maven project using that uses Spark, and I am trying implement logging using Logback. I am compiling my application to a jar, and deploying to an EC2 instance
I had to modify the solution presented by Atais to get it working in cluster mode. This worked for me:
libs="/absolute/path/to/libs/*"
spark-submit \
--master yarn \
--deploy-mode cluster \
... \
--jars $libs \
--conf spark.driver.extraClassPath=log4j-over-slf4j-1.7.25.jar:logback-classic-1.2.3.jar:logback-core-1.2.3.jar:logstash-logback-encoder-6.4.jar \
--conf spark.executor.extraClassPath=log4j-over-slf4j-1.7.25.jar:logback-classic-1.2.3.jar:logback-core-1.2.3.jar:logstash-logback-encoder-6.4.jar \
/my/application/application-fat.jar \
param1 param2
The underlying reason was that the jars were not available to all nodes and had to be made explicitly available (even after submitting with --jars).
Update: Refined the solution further. You can also pass the jars as list of urls, i.e. --jars url1,url2,url3. These jars still have to be added to the class path to be prioritized over log4j.