可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to run pyspark on my macbook air. When i try starting it up I get the error:

Exception: Java gateway process exited before sending the driver its port number

when sc = SparkContext() is being called upon startup. I have tried running the following commands:

./bin/pyspark ./bin/spark-shell export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

with no avail. I have also looked here:

Spark + Python - Java gateway process exited before sending the driver its port number?

but the question has never been answered. Please help! Thanks.

回答1:

this should help you

One solution is adding pyspark-shell to the shell environment variable PYSPARK_SUBMIT_ARGS:

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

There is a change in python/pyspark/java_gateway.py , which requires PYSPARK_SUBMIT_ARGS includes pyspark-shell if a PYSPARK_SUBMIT_ARGS variable is set by a user.

回答2:

One possible reason is JAVA_HOME is not set because java is not installed.

I encountered the same issue. It says

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 51.0     at java.lang.ClassLoader.defineClass1(Native Method)     at java.lang.ClassLoader.defineClass(ClassLoader.java:643)     at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)     at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)     at java.net.URLClassLoader.access$000(URLClassLoader.java:73)     at java.net.URLClassLoader$1.run(URLClassLoader.java:212)     at java.security.AccessController.doPrivileged(Native Method)     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:296)     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)     at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:406) Traceback (most recent call last):   File "", line 1, in    File "/opt/spark/python/pyspark/conf.py", line 104, in __init__     SparkContext._ensure_initialized()   File "/opt/spark/python/pyspark/context.py", line 243, in _ensure_initialized     SparkContext._gateway = gateway or launch_gateway()   File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway     raise Exception("Java gateway process exited before sending the driver its port number") Exception: Java gateway process exited before sending the driver its port number

at sc = pyspark.SparkConf(). I solved it by running

sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer

which is from https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04

回答3:

Had the same issue with my iphython notebook (IPython 3.2.1) on Linux (ubuntu).

What was missing in my case was setting the master URL in the $PYSPARK_SUBMIT_ARGS environment like this (assuming you use bash):

export PYSPARK_SUBMIT_ARGS="--master spark://:"

e.g.

export PYSPARK_SUBMIT_ARGS="--master spark://192.168.2.40:7077"

You can put this into your .bashrc file. You get the correct URL in the log for the spark master (the location for this log is reported when you start the master with /sbin/start_master.sh).

回答4:

I got the same Java gateway process exited......port number exception even though I set PYSPARK_SUBMIT_ARGS properly. I'm running Spark 1.6 and trying to get pyspark to work with IPython4/Jupyter (OS: ubuntu as VM guest).

While I got this exception, I noticed an hs_err_*.log was generated and it started with:

There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (malloc) failed to allocate 715849728 bytes for committing reserved memory.

So I increased the memory allocated for my ubuntu via VirtualBox Setting and restarted the guest ubuntu. Then this Java gateway exception goes away and everything worked out fine.

回答5:

I got the same Exception: Java gateway process exited before sending the driver its port number in Cloudera VM when trying to start IPython with CSV support with a syntax error:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10.1.4.0

will throw the error, while:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.4.0

will not.

The difference is in that last colon in the last (working) example, seperating the Scala version number from the package version number.

回答6:

In my case this error came for the script which was running fine before. So I figured out that this might be due to my JAVA update. Before I was using java 1.8 but I had accidentally updated to java 1.9. When I switched back to java 1.8 the error disappeared and everything is running fine. For those, who get this error for the same reason but do not know how to switch back to older java version on ubuntu: run

sudo update-alternatives --config java

and make the selection for java version

回答7:

I got this error because I was running low on disk space.

回答8:

Had same issue, after installing java using below lines solved the issue !

sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer

回答9:

I had the same exception: installing java jdk worked for me.

文章来源: Pyspark: Exception: Java gateway process exited before sending the driver its port number

标签

gateway

spark