Pyspark - How to inspect variables within RDD operations

白昼怎懂夜的黑 提交于 2019-12-24 11:44:28

问题


I used to develop in Scala Spark using IntelliJ. I was able to inspect variable contents under debug mode by setting break point. Like this

I recently start a new project using pyspark with pycharm. I found code does not stop at break point in Spark operations, like below.

And another question is the prompt hint does not give right hint for instance from "map" function. Seems IDE does not know the variable from "map" function is still RDD, my guess is it related to python function does not define return type.

I feel these naive question for PySpark developers. Any help would be great, thank you!


回答1:


"...code does not stop at break point in Spark operations, like below..." - Could you please clarify what is your PyCharm version and OS?

"And another question is the prompt hint does not give right hint for instance from "map" function. Seems IDE does not know the variable from "map" function is still rdd..." - I believe it is related to this feature request https://youtrack.jetbrains.com/issue/PY-29811



来源:https://stackoverflow.com/questions/52452981/pyspark-how-to-inspect-variables-within-rdd-operations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!