Unable to run pyspark

后端 未结 5 1411
被撕碎了的回忆
被撕碎了的回忆 2020-12-13 10:32

I installed Spark on Windows, and I\'m unable to start pyspark. When I type in c:\\Spark\\bin\\pyspark, I get the following error:

相关标签:
5条回答
  • 2020-12-13 11:22

    Spark <= 2.1.0 is not compatible with Python 3.6. See this issue, which also claims that this will be fixed with the upcoming Spark release.

    0 讨论(0)
  • 2020-12-13 11:22

    I resolved this issue using one change in the pythons script.

    I have place below piece of code in python script named serializers.py , location is c:\your-installation-dir\spark-2.0.2-bin-hadoop-2.7\python\pyspark\ and below line to be replace at Line number 381.

    cls = _old_namedtuple(*args, **kwargs, verbose=False, rename=False, module=None)
    

    And then run pyspark into your command line this will work..

    0 讨论(0)
  • 2020-12-13 11:23

    The Possible Issues faced when running Spark on Windows is, of not giving proper Path or by using Python 3.x to run Spark.

    So,

    1. Do check Path Given for spark i.e /usr/local/spark Proper or Not.
    2. Do set Python Path to Python 2.x (remove Python 3.x).
    0 讨论(0)
  • 2020-12-13 11:24

    I wanted to extend on Indrajeet's answer, since he mentioned line numbers instead of the exact location of the code. Please see this in addition to his answer for further clarification.

    cls = _old_namedtuple(*args, **kwargs)
    is the line that was changed being referred in his answer

    def _hijack_namedtuple():
    """ Hack namedtuple() to make it picklable """
    # hijack only one time
    if hasattr(collections.namedtuple, "__hijack"):
        return
    
    global _old_namedtuple  # or it will put in closure
    
    def _copy_func(f):
        return types.FunctionType(f.__code__, f.__globals__, f.__name__,
                                  f.__defaults__, f.__closure__)
    
    _old_namedtuple = _copy_func(collections.namedtuple)
    
    def namedtuple(*args, **kwargs):
        # cls = _old_namedtuple(*args, **kwargs)
        cls = _old_namedtuple(*args, **kwargs, verbose=False, rename=False, module=None)
        return _hack_namedtuple(cls)
    

    !!! EDIT 6th Mar 2017!! This did fix the original issue, but I don't think this will make spark 2.1 compatible with 3.6 yet, there were more collisions further down. As a result I used conda to create a python 35 virtual environment and it worked like a charm.

    (Windows, assuming you have env variables in place)

    >conda create -n py35 python=3.5
    >activate py35 
    >pyspark
    
    0 讨论(0)
  • 2020-12-13 11:27

    Spark 2.1.0 doesn't support python 3.6.0. To solve this change your python version in anaconda environment. Run following command in your anaconda env

    conda create -n py35 python=3.5 anaconda
    activate py35
    
    0 讨论(0)
提交回复
热议问题