I\'m trying to run this code:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder \\
.master(\"local\") \\
.app
I had same issue and all above settings did not work for me. Actually i had SPARK_HOME already set. Finally the issue was because i simply installed pyspark using pip install pyspark without verifying the version.
After a lot of debugging inside the code , figured out that the _PYSPARK_DRIVER_CALLBACK_HOST inside
anaconda3/lib/python3.7/site-packages/pyspark/java_gateway.py
did not have this variable whereas older versions of pyspark has it (I am using anaconda hence this file path location. The exact location of the file might be different for others)
Finally came to the conclusion that it was due to version mismatch. Seems very stupid enough , but i guess it might help others a lot of debugging time.
solution is to find out the spark version that is installed for eg 2.3.0 and then ensure to install pyspark of same version pip install pyspark==2.3.0 . After this it worked like a charm.
Note : this issue occurs only if i call SparkSession.builder.appName within python . It was working fine even with the version mismatch for pyspark and spark-submit commands , and that's why it easily skipped my mind that it can be due to version mismatch.