Bluemix Analytics for Apache Spark log file information required

懵懂的女人 提交于 2019-12-10 17:56:51

问题


I would like more information when debugging my spark notebook. I have found some log files:

!ls $HOME/notebook/logs/

The files are:

bootstrap-nnnnnnnn_nnnnnn.log
jupyter-nnnnnnnn_nnnnnn.log   
kernel-pyspark-nnnnnnnn_nnnnnn.log
kernel-scala-nnnnnnnn_nnnnnn.log
logs-nnnnnnnn.tgz
monitor-nnnnnnnn_nnnnnn.log
spark160master-ego.log

Which applications log to these files and what information is written to each of these files?


回答1:


When debugging notebooks, the kernel-*-*.log files are the ones you're looking for.

In logical order...

  1. bootstrap-*.log is written when the service starts. One file for each start, the timestamp indicates when that happened. Contains output from the startup script which initializes the user environment, creates kernel specs, prepares the Spark config, and the like.

  2. bootstrap-*_allday.log has a record for each service start and stop on that day.

  3. jupyter-*.log contains output from the Jupyter server. After the initializations from bootstrap-*.log are done, the Jupyter server is started. That's when this file is created. You'll see log entries when notebook kernels are started or stopped, and when a notebook is saved.

  4. monitor-*.log contains output from a monitoring script that is started with the service. The monitoring script has to detect on which port the Jupyter server is listening. Afterwards, it keeps an eye on service activity and shuts down the service when it's been idle too long.

  5. kernel-*-*.log contains output from notebook kernels. Every kernel gets a separate log file, the timestamp indicates when the kernel started. The second word in the filename indicates the type of kernel.

  6. spark*-ego.log contains output from Spark job scheduling. It's used by the monitoring script to detect whether Spark is active although the notebook kernels are idle.

  7. logs-*.tgz contains archived logs of the respective day. They'll be deleted automatically after a few days.




回答2:


With the recently enabled "environment" feature in DSX, the logs have moved to directory /var/pod/logs/. You will still see the kernel-*-*.log and jupyter-*.log files for your current session. However, they're not useful for debugging.

In the Spark as a Service backend, each kernel has a Spark driver process which logs to the kernel-*-*.log file. The environment feature comes without Spark, and the kernel itself does not generate output for the log file.



来源:https://stackoverflow.com/questions/38207917/bluemix-analytics-for-apache-spark-log-file-information-required

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!