Whenever I am trying to execute a simple processing in pyspark, it fails to open the socket.
>>> myRDD = sc.parallelize(range(6), 3)
>>> sc
Its not the ideal solution but now I am aware of the cause. Pyspark is unable to create jvm socket with JDK 1.8 (64-bit) version, so I just set my java path to jdk 1.7 and it worked.
Finally, I solved my problem.
when I started pyspark, I suddenly realized there was a warning which might have a connection with the issue.
WARN Utils:66 - Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 172.16.20.244 instead (on interface en0) 2020-09-27 17:26:10 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
Then I made a change of /etc/hosts, commenting 127.0.0.1 and adding a new line to solve the loopback problem, like this,
#127.0.0.1 localhost
#255.255.255.255 broadcasthost
#:::1 localhost
172.16.20.244 localhost
It worked.
I hope it could help those have a lot of pains solving this problem with the similar warnings.
I was having the exact same error, tried JDK 1.7 and it didn't work, then i went and edited the /etc/hosts file and realized i had the following lines
127.0.0.1 mbp.local localhost
127.0.0.1 localhost
Just commented out the line with my computer local name and it worked.
#127.0.0.1 mbp.local localhost
127.0.0.1 localhost
Tested on PySpark 1.6.3 and 2.0.2 with JDK 1.8