How to debug Spark application on Spark Standalone?

后端未结

关注

 4  1114

北恋 2020-12-13 20:02

I am trying to debug a Spark Application on a cluster using a master and several worker nodes. I have been successful at setting up the master node and worker nodes using Sp

4条回答

无人及你 (楼主)

2020-12-13 20:56
It's important to distinguish between debugging the driver program and debugging one of the executors. They require different options passed to spark-submit

For debugging the driver you can add the following to your spark-submit command. Then set your remote debugger to connect to the node you launched your driver program on.
```
--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
```
In this example port 5005 was specified, but you may need to customize that if something is already running on that port.

Connecting to an executor is similar, add the following options to your spark-submit command.
```
--num-executors 1 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=wm1b0-8ab.yourcomputer.org:5005,suspend=n"
```
Replace the address with your local computer's address. (It's a good idea to test that you can access it from your spark cluster).

In this case, start your debugger in listening mode, then start your spark program and wait for the executor to attach to your debugger. It's important to set the number of executors to 1 or multiple executors will all try to connect to your debugger, likely causing problems.

These examples are for running with sparkMaster set as yarn-client although they may also work when running under mesos. If you're running using yarn-cluster mode you may have to set the driver to attach to your debugger rather than attaching your debugger to the driver, since you won't necessarily know in advance what node the driver will be executing on.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...