Why do columns change to nullable in Apache Spark SQL?

前端 未结 2 1766
孤独总比滥情好
孤独总比滥情好 2020-12-06 18:04

Why is nullable = true used after some functions are executed even though there are no NaN values in the DataFrame.



        
2条回答
  •  半阙折子戏
    2020-12-06 18:26

    You could change schema of dataframe very quickly as well. something like this would do the job -

    def setNullableStateForAllColumns( df: DataFrame, columnMap: Map[String, Boolean]) : DataFrame = {
        import org.apache.spark.sql.types.{StructField, StructType}
        // get schema
        val schema = df.schema
        val newSchema = StructType(schema.map {
        case StructField( c, d, n, m) =>
          StructField( c, d, columnMap.getOrElse(c, default = n), m)
        })
        // apply new schema
        df.sqlContext.createDataFrame( df.rdd, newSchema )
    }
    

提交回复
热议问题