I am trying to write a very simple code using Spark in Pycharm and my os is Windows 8. I have been dealing with several problems which somehow managed to fix except for one.
I have faced this problem, it's caused by python version conflicts on diff nodes of cluster, so, it can be solved by
export PYSPARK_PYTHON=/usr/bin/python
which are the same version on diff nodes. and then start:
pyspark
After struggling with this for two days, I figured what the problem is. I added the followings to the "PATH" variable as windows environment variable:
C:/Spark/spark-1.4.1-bin-hadoop2.6/python/pyspark
C:\Python27
Remember, You need to change the directory to wherever your spark is installed and also the same thing for python. On the other hand, I have to mention that I am using prebuild version of spark which has Hadoop included.
Best of luck to you all.
I had the same problem as you, and then I made the following changes: set PYSPARK_PYTHON as environment variable to point to python.exe in Edit Configurations of Pycharm, here is my example:
PYSPARK_PYTHON = D:\Anaconda3\python.exe
SPARK_HOME = D:\spark-1.6.3-bin-hadoop2.6
PYTHONUNBUFFERED = 1
I had to set SPARK_PYTHONPATH
as environment variable to point to python.exe file in addition to PYTHONPATH
and SPARK_HOME
variables as
SPARK_PYTHONPATH=C:\Python27\python.exe