How do I get pyspark working in Jupyter Notebook in a virtual environment on Windows?

落爺英雄遲暮 提交于 2021-01-29 07:09:40

问题


I'm receiving the dreaded 'Exception: Java gateway process exited before sending its port number' error but I've followed everything I can find already and it's still not working. The worst thing is I swear this set up worked last week and somehow doesn't anymore.

I can run pyspark perfectly fine in the virtual env from the command line and outside of the virutal environment (I'm using Pipenv) so it must be something to do with Jupyter Notebook. Has anyone solved this problem on Windows who can help me?


回答1:


Set JAVA_HOME environment variable in your python script:

os.environ['JAVA_HOME'] = '/path/to/your/java/exe/'

If that doesn't work, try setting PATH too:

os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]




回答2:


I figured out a fix from here. My JAVA, SPARK_HOME, and HADOOP_HOME environment variables were configured properly but I added

PYSPARK_DRIVER_PYTHON = jupyter PYSPARK_DRIVER_PYTHON_OPTS = notebook

and it's working for now.



来源:https://stackoverflow.com/questions/65129742/how-do-i-get-pyspark-working-in-jupyter-notebook-in-a-virtual-environment-on-win

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!