What is causing 'unicode' object has no attribute 'toordinal' in pyspark?

筅森魡賤 提交于 2020-02-03 08:20:26

问题


I got this error but I don't what causes it. My python code ran in pyspark. The stacktrace is long and i just show some of them. All the stacktrace doesn't show my code in it so I don't know where to look for. What is possible the cause for this error?

/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    306                 raise Py4JJavaError(
    307                     "An error occurred while calling {0}{1}{2}.\n".
--> 308                     format(target_id, ".", name), value)
    309             else:
    310                 raise Py4JError(

Py4JJavaError: An error occurred while calling o107.parquet.

...
File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 435, in toInternal
    return self.dataType.toInternal(obj)
  File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 172, in toInternal
    return d.toordinal() - self.EPOCH_ORDINAL
AttributeError: 'unicode' object has no attribute 'toordinal'

Thanks,


回答1:


The specific exception is caused by trying to store a unicode value in a date datatype that is part of a struct. The conversion of the Python type to Spark internal representation expected to be able to call date.toordinal() method.

Presumably you have a dataframe schema somewhere that consists of a struct type with a date field, and something tried to stuff a string into that.

You can trace this based on the traceback you do have. The Apache Spark source code is hosted on GitHub, and your traceback points to the pyspark/sql/types.py file. The lines point to the StructField.toInternal() method, which delegates to the self.dataType.toInternal() method:

class StructField(DataType):
    # ...
    def toInternal(self, obj):
        return self.dataType.toInternal(obj)

which in your traceback ends up at the DateType.toInternal() method:

class DateType(AtomicType):
    # ...
    def toInternal(self, d):
        if d is not None:
            return d.toordinal() - self.EPOCH_ORDINAL

So we know this is about a date field in a struct. The DateType.fromInternal() shows you what Python type is produced in the opposite direction:

def fromInternal(self, v):
    if v is not None:
        return datetime.date.fromordinal(v + self.EPOCH_ORDINAL)

It is safe to assume that toInternal() expects the same type when converting in the other direction.



来源:https://stackoverflow.com/questions/39757591/what-is-causing-unicode-object-has-no-attribute-toordinal-in-pyspark

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!