问题
I am new to Spark and I am trying to install the PySpark by referring to the below site.
http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/
I tried to install both prebuilt package and also by building the Spark package thru SBT.
When I try to run a python code in IPython Notebook I get the below error.
NameError Traceback (most recent call last)
<ipython-input-1-f7aa330f6984> in <module>()
1 # Check that Spark is working
----> 2 largeRange = sc.parallelize(xrange(100000))
3 reduceTest = largeRange.reduce(lambda a, b: a + b)
4 filterReduceTest = largeRange.filter(lambda x: x % 7 == 0).sum()
5
NameError: name 'sc' is not defined
In the command window I can see the below error.
<strong>Failed to find Spark assembly JAR.</strong>
<strong>You need to build Spark before running this program.</strong>
Note that I got a scala prompt when I executed spark-shell command
Update:
With help of a friend I am able to fix the issue related to Spark assembly JAR by correcting the contents of .ipython/profile_pyspark/startup/00-pyspark-setup.py file
I have now only the problem of Spark Context variable. Changing the title to be appropriately reflect my current issue.
回答1:
One solution is adding pyspark-shell to the shell environment variable PYSPARK_SUBMIT_ARGS:
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
There is a change in python/pyspark/java_gateway.py , which requires PYSPARK_SUBMIT_ARGS includes pyspark-shell if a PYSPARK_SUBMIT_ARGS variable is set by a user.
回答2:
you need to do the following after you have pyspark in your path:
from pyspark import SparkContext
sc =SparkContext()
回答3:
You have to creat instance of SparkContext like following:
import:
from pyspark import SparkContext
and then:
sc =SparkContext.getOrCreate()
NB:sc =SparkContext.getOrCreate() works well than sc =SparkContext().
回答4:
Just a little improvement. Add following at top of your python script file.
#! /bin/python
from pyspark import SparkContext, SparkConf
sc =SparkContext()
# your code starts here
回答5:
This worked for me in the spark version 2.3.1
from pyspark import SparkContext
sc = SparkContext()
回答6:
I added the below lines provided by Venu.
from pyspark import SparkContext
sc =SparkContext()
Then the below subsequent error was resolved by removing the Environment variable PYSPARK_SUBMIT_ARGS.
C:\Spark\spark-1.3.1-bin-hadoop2.6\python\pyspark\java_gateway.pyc in launch_gateway() 77 callback_socket.close() 78 if gateway_port is None:
---> 79 raise Exception("Java gateway process exited before sending the driver its port number")
80
81 # In Windows, ensure the Java child processes do not linger after Python has exited. Exception: Java gateway process exited before sending the driver its port number
回答7:
I also encountered the Java gateway process exited before sending the driver its port number error message.
I could solve that problem by downloading one of the versions that are prebuilt for Hadoop (I used the one for hadoop 2.4). As I do not use Hadoop, I have no idea why this changed something, but it now works flawlessly for me...
回答8:
I was getting a similar error trying to get pySpark working via PyCharm, and I noticed in the log, just before this error I was getting this error:
env: not found
I traced this down to the fact that I did not have a Java home environment variable set.. so I added os.environ['JAVA_HOME'] = "/usr/java/jdk1.7.0_67-cloudera"
to my script ( I am aware that this is probably not the best place for it) and the error goes and I get my spark object created
回答9:
Spark on my Mac is 1.6.0 so adding pyspark-shell did not solve the problem.
What worked for me is following the answer given here by @karenyng
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
回答10:
I had the same problem in my case problem was another notebook was running (in recent versions they are shown in green). I selected and shut down one of them and it worked fine.
Sorry for invoking old thread but it may help someone :)
回答11:
This script worked for me (in linux):
#!/bin/bash
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS="--pylab -c 'from pyspark import SparkContext; sc=SparkContext()' -i"
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
pyspark
To call pyspark as I'm calling there I'm assuming that "spark/bin" installation path is in the PATH variable. If not, call instead /path/to/spark/bin/pyspark.
来源:https://stackoverflow.com/questions/30763951/spark-context-sc-not-defined