PySpark: StructField(…, …, False) always returns `nullable=true` instead of `nullable=false`

前端 未结 1 721
礼貌的吻别
礼貌的吻别 2020-12-16 00:05

I\'m new to PySpark and am facing a strange problem. I\'m trying to set some column to non-nullable while loading a CSV dataset. I can reproduce my case with a very small da

相关标签:
1条回答
  • 2020-12-16 00:55

    While Spark behavior (switch from False to True here is confusing there is nothing fundamentally wrong going on here. nullable argument is not a constraint but a reflection of the source and type semantics which enables certain types of optimization

    You state that you want to avoid null values in your data. For this you should use na.drop method.

    df.na.drop()
    

    For other ways of handling nulls please take a look at the DataFrameNaFunctions (exposed using DataFrame.na property) documentation.

    CSV format doesn't provide any tools which allow you to specify data constraints so by definition reader cannot assume that input is not null and your data indeed contains nulls.

    0 讨论(0)
提交回复
热议问题