问题
I would like more information when debugging my spark notebook. I have found some log files:
!ls $HOME/notebook/logs/
The files are:
bootstrap-nnnnnnnn_nnnnnn.log
jupyter-nnnnnnnn_nnnnnn.log
kernel-pyspark-nnnnnnnn_nnnnnn.log
kernel-scala-nnnnnnnn_nnnnnn.log
logs-nnnnnnnn.tgz
monitor-nnnnnnnn_nnnnnn.log
spark160master-ego.log
Which applications log to these files and what information is written to each of these files?
回答1:
When debugging notebooks, the kernel-*-*.log
files are the ones you're looking for.
In logical order...
bootstrap-*.log
is written when the service starts. One file for each start, the timestamp indicates when that happened. Contains output from the startup script which initializes the user environment, creates kernel specs, prepares the Spark config, and the like.bootstrap-*_allday.log
has a record for each service start and stop on that day.jupyter-*.log
contains output from the Jupyter server. After the initializations frombootstrap-*.log
are done, the Jupyter server is started. That's when this file is created. You'll see log entries when notebook kernels are started or stopped, and when a notebook is saved.monitor-*.log
contains output from a monitoring script that is started with the service. The monitoring script has to detect on which port the Jupyter server is listening. Afterwards, it keeps an eye on service activity and shuts down the service when it's been idle too long.kernel-*-*.log
contains output from notebook kernels. Every kernel gets a separate log file, the timestamp indicates when the kernel started. The second word in the filename indicates the type of kernel.spark*-ego.log
contains output from Spark job scheduling. It's used by the monitoring script to detect whether Spark is active although the notebook kernels are idle.logs-*.tgz
contains archived logs of the respective day. They'll be deleted automatically after a few days.
回答2:
With the recently enabled "environment" feature in DSX, the logs have moved to directory /var/pod/logs/
. You will still see the kernel-*-*.log
and jupyter-*.log
files for your current session. However, they're not useful for debugging.
In the Spark as a Service backend, each kernel has a Spark driver process which logs to the kernel-*-*.log
file. The environment feature comes without Spark, and the kernel itself does not generate output for the log file.
来源:https://stackoverflow.com/questions/38207917/bluemix-analytics-for-apache-spark-log-file-information-required