How to force inferSchema for CSV to consider integers as dates (with “dateFormat” option)?

后端 未结 2 1284
暗喜
暗喜 2020-12-05 21:32

I use Spark 2.2.0

I am reading a csv file as follows:

val dataFrame = spark.read.option(\"inferSchema\", \"         


        
2条回答
  •  感动是毒
    2020-12-05 22:02

    If my understanding is correct, the code implies the following order of type inference (with the first types being checked against first):

    • NullType
    • IntegerType
    • LongType
    • DecimalType
    • DoubleType
    • TimestampType
    • BooleanType
    • StringType

    With that, I think the issue is that 20171001 matches IntegerType before even considering TimestampType (which uses timestampFormat not dateFormat option).

    One solution would be to define the schema and use it with schema operator (of DataFrameReader) or let Spark SQL infer the schema and use cast operator.

    I'd choose the former if the number of fields is not high.

提交回复
热议问题