问题
I'm receiving the dreaded 'Exception: Java gateway process exited before sending its port number' error but I've followed everything I can find already and it's still not working. The worst thing is I swear this set up worked last week and somehow doesn't anymore.
I can run pyspark perfectly fine in the virtual env from the command line and outside of the virutal environment (I'm using Pipenv) so it must be something to do with Jupyter Notebook. Has anyone solved this problem on Windows who can help me?
回答1:
Set JAVA_HOME environment variable in your python script:
os.environ['JAVA_HOME'] = '/path/to/your/java/exe/'
If that doesn't work, try setting PATH too:
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
回答2:
I figured out a fix from here. My JAVA, SPARK_HOME, and HADOOP_HOME environment variables were configured properly but I added
PYSPARK_DRIVER_PYTHON = jupyter PYSPARK_DRIVER_PYTHON_OPTS = notebook
and it's working for now.
来源:https://stackoverflow.com/questions/65129742/how-do-i-get-pyspark-working-in-jupyter-notebook-in-a-virtual-environment-on-win