How to fix 'TypeError: an integer is required (got type bytes)' error when trying to run pyspark after installing spark 2.4.4

后端 未结 4 1212
温柔的废话
温柔的废话 2020-12-28 12:05

I\'ve installed OpenJDK 13.0.1 and python 3.8 and spark 2.4.4. Instructions to test the install is to run .\\bin\\pyspark from the root of the spark installation. I\'m not

相关标签:
4条回答
  • 2020-12-28 12:27

    Its python and pyspark version mismatch like John rightly pointed out. For a newer python version you can try,

    pip install --upgrade pyspark
    

    That will update the package, if one is available. If this doesn't help then you might have to downgrade to a compatible version of python.


    pyspark package doc clearly states:

    NOTE: If you are using this with a Spark standalone cluster you must ensure that the version (including minor version) matches or you may experience odd errors.

    0 讨论(0)
  • 2020-12-28 12:38

    As a dirty workaround one can replace the _cell_set_template_code with the Python3-only implementation suggested by docstring of _make_cell_set_template_code function:

    Notes
    -----
    In Python 3, we could use an easier function:
    
    .. code-block:: python
    
       def f():
           cell = None
    
           def _stub(value):
               nonlocal cell
               cell = value
    
           return _stub
    
        _cell_set_template_code = f()
    

    Here is a patch for spark v2.4.5: https://gist.github.com/ei-grad/d311d0f34b60ebef96841a3a39103622

    Apply it by:

    git apply <(curl https://gist.githubusercontent.com/ei-grad/d311d0f34b60ebef96841a3a39103622/raw)
    

    This fixes the problem with ./bin/pyspark, but ./bin/spark-submit uses bundled pyspark.zip with its own copy of cloudpickle.py. And if it would be fixed there, then it still wouldn't work, failing with the same error while unpickling some object in pyspark/serializers.py.

    But it looks like Python 3.8 support is already arrived to spark v3.0.0-preview2, so one can try it. Or, stick to Python 3.7, like the accepted answer suggests.

    0 讨论(0)
  • 2020-12-28 12:47

    This is happening because you're using python 3.8. The latest pip release of pyspark (pyspark 2.4.4 at time of writing) doesn't support python 3.8. Downgrade to python 3.7 for now, and you should be fine.

    0 讨论(0)
  • 2020-12-28 12:48

    Try to install the latest version of pyinstaller that can be compatible with python 3.8 using this command:

    pip install https://github.com/pyinstaller/pyinstaller/archive/develop.tar.gz
    

    reference:
    https://github.com/pyinstaller/pyinstaller/issues/4265

    0 讨论(0)
提交回复
热议问题