I have a Python Spark program which I run with spark-submit. I want to put logging statements in it.
logging.info(\"This is an informative messa
You need to get the logger for spark itself, by default getLogger() will return the logger for you own module. Try something like:
logger = logging.getLogger('py4j')
logger.info("My test info statement")
It might also be 'pyspark' instead of 'py4j'.
In case the function that you use in your spark program (and which does some logging) is defined in the same module as the main function it will give some serialization error.
This is explained here and an example by the same person is given here
I also tested this on spark 1.3.1
EDIT:
To change logging from STDERR to STDOUT you will have to remove the current StreamHandler and add a new one.
Find the existing Stream Handler (This line can be removed when finished)
print(logger.handlers)
# will look like []
There will probably only be a single one, but if not you will have to update position.
logger.removeHandler(logger.handlers[0])
Add new handler for sys.stdout
import sys # Put at top if not already there
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
logger.addHandler(sh)