问题
I've installed OpenJDK 13.0.1 and python 3.8 and spark 2.4.4. Instructions to test the install is to run .\bin\pyspark from the root of the spark installation. I'm not sure if I missed a step in the spark installation, like setting some environment variable, but I can't find any further detailed instructions.
I can run the python interpreter on my machine, so I'm confident that it is installed correctly and running "java -version" gives me the expected response, so I don't think the problem is with either of those.
I get a stack trace of errors from cloudpickly.py:
Traceback (most recent call last):
File "C:\software\spark-2.4.4-bin-hadoop2.7\bin\..\python\pyspark\shell.py", line 31, in <module>
from pyspark import SparkConf
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\context.py", line 31, in <module>
from pyspark import accumulators
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\serializers.py", line 71, in <module>
from pyspark import cloudpickle
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
回答1:
This is happening because you're using python 3.8. The latest pip release of pyspark (pyspark 2.4.4 at time of writing) doesn't support python 3.8. Downgrade to python 3.7 for now, and you should be fine.
回答2:
Try to install the latest version of pyinstaller that can be compatible with python 3.8 using this command:
pip install https://github.com/pyinstaller/pyinstaller/archive/develop.tar.gz
reference:
https://github.com/pyinstaller/pyinstaller/issues/4265
回答3:
As a dirty workaround one can replace the _cell_set_template_code
with the Python3-only implementation suggested by docstring of _make_cell_set_template_code
function:
Notes
-----
In Python 3, we could use an easier function:
.. code-block:: python
def f():
cell = None
def _stub(value):
nonlocal cell
cell = value
return _stub
_cell_set_template_code = f()
Here is a patch for spark v2.4.5: https://gist.github.com/ei-grad/d311d0f34b60ebef96841a3a39103622
Apply it by:
git apply <(curl https://gist.githubusercontent.com/ei-grad/d311d0f34b60ebef96841a3a39103622/raw)
This fixes the problem with ./bin/pyspark, but ./bin/spark-submit uses bundled pyspark.zip with its own copy of cloudpickle.py. And if it would be fixed there, then it still wouldn't work, failing with the same error while unpickling some object in pyspark/serializers.py
.
But it looks like Python 3.8 support is already arrived to spark v3.0.0-preview2, so one can try it. Or, stick to Python 3.7, like the accepted answer suggests.
来源:https://stackoverflow.com/questions/58700384/how-to-fix-typeerror-an-integer-is-required-got-type-bytes-error-when-tryin