Unable to run pyspark

后端未结

关注

 5  1447

I installed Spark on Windows, and I\'m unable to start pyspark. When I type in c:\\Spark\\bin\\pyspark, I get the following error:

相关标签:

5条回答

鱼传尺愫

2020-12-13 11:22

Spark <= 2.1.0 is not compatible with Python 3.6. See this issue, which also claims that this will be fixed with the upcoming Spark release.

0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2020-12-13 11:22
I resolved this issue using one change in the pythons script.

I have place below piece of code in python script named serializers.py , location is c:\your-installation-dir\spark-2.0.2-bin-hadoop-2.7\python\pyspark\ and below line to be replace at Line number 381.
```
cls = _old_namedtuple(*args, **kwargs, verbose=False, rename=False, module=None)
```
And then run pyspark into your command line this will work..
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-12-13 11:23
The Possible Issues faced when running Spark on Windows is, of not giving proper Path or by using Python 3.x to run Spark.

So,
1. Do check Path Given for spark i.e /usr/local/spark Proper or Not.
2. Do set Python Path to Python 2.x (remove Python 3.x).
0 讨论(0)
发布评论:

提交评论
- 加载中...

执笔经年

2020-12-13 11:24

I wanted to extend on Indrajeet's answer, since he mentioned line numbers instead of the exact location of the code. Please see this in addition to his answer for further clarification.

cls = _old_namedtuple(*args, **kwargs)
is the line that was changed being referred in his answer

def _hijack_namedtuple():
""" Hack namedtuple() to make it picklable """
# hijack only one time
if hasattr(collections.namedtuple, "__hijack"):
    return

global _old_namedtuple  # or it will put in closure

def _copy_func(f):
    return types.FunctionType(f.__code__, f.__globals__, f.__name__,
                              f.__defaults__, f.__closure__)

_old_namedtuple = _copy_func(collections.namedtuple)

def namedtuple(*args, **kwargs):
    # cls = _old_namedtuple(*args, **kwargs)
    cls = _old_namedtuple(*args, **kwargs, verbose=False, rename=False, module=None)
    return _hack_namedtuple(cls)

!!! EDIT 6th Mar 2017!! This did fix the original issue, but I don't think this will make spark 2.1 compatible with 3.6 yet, there were more collisions further down. As a result I used conda to create a python 35 virtual environment and it worked like a charm.

(Windows, assuming you have env variables in place)

>conda create -n py35 python=3.5
>activate py35 
>pyspark

0 讨论(0)

予麋鹿

2020-12-13 11:27
Spark 2.1.0 doesn't support python 3.6.0. To solve this change your python version in anaconda environment. Run following command in your anaconda env
```
conda create -n py35 python=3.5 anaconda
activate py35
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

Unable to run pyspark

cls = _old_namedtuple(*args, **kwargs) is the line that was changed being referred in his answer

cls = _old_namedtuple(*args, **kwargs)
is the line that was changed being referred in his answer