I am using spark over emr and writing a pyspark script, I am getting an error when trying to
from pyspark import SparkContext
sc = SparkContext()
>
You need to set the following environments to set the Spark path and the Py4j path.
For example in ~/.bashrc:
export SPARK_HOME=/home/hadoop/spark-2.1.0-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/bin:$SPARK_HOME/python:$PATH
And use findspark at the top of the your file:
import findspark
findspark.init()