Replace empty strings with None/null values in DataFrame

后端 未结 5 1535
野趣味
野趣味 2020-12-13 10:01

I have a Spark 1.5.0 DataFrame with a mix of null and empty strings in the same column. I want to convert all empty strings in all columns to null

5条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-13 10:18

    My solution is much better than all the solutions I'v seen so far, which can deal with as many fields as you want, see the little function as the following:

      // Replace empty Strings with null values
      private def setEmptyToNull(df: DataFrame): DataFrame = {
        val exprs = df.schema.map { f =>
          f.dataType match {
            case StringType => when(length(col(f.name)) === 0, lit(null: String).cast(StringType)).otherwise(col(f.name)).as(f.name)
            case _ => col(f.name)
          }
        }
    
        df.select(exprs: _*)
      }
    

    You can easily rewrite the function above in Python.

    I learned this trick from @liancheng

提交回复
热议问题