How to get column names with all values null?

后端 未结 2 1124
再見小時候
再見小時候 2020-12-10 22:34

I don\'t have any ideas to get column names when it has null value

For example,

case class A(name: String, id: String, email: String, company: String         


        
相关标签:
2条回答
  • 2020-12-10 22:58

    You can do a simple count on all your columns, then using the indices of the columns that return a count of 0, you subset df.columns:

    import org.apache.spark.sql.functions.{count,col}
    // Get column indices
    val col_inds = df.select(df.columns.map(c => count(col(c)).alias(c)): _*)
                     .collect()(0)
                     .toSeq.zipWithIndex
                     .filter(_._1 == 0).map(_._2)
    // Subset column names using the indices
    col_inds.map(i => df.columns.apply(i))
    //Seq[String] = ArrayBuffer(id, company)
    
    0 讨论(0)
  • 2020-12-10 23:08

    An alternative solution could be as follows (but am afraid the performance might not be satisfactory).

    val ids = Seq(
      ("1", null: String), 
      ("1", null: String),
      ("10", null: String)
    ).toDF("id", "all_nulls")
    
    scala> ids.show
    +---+---------+
    | id|all_nulls|
    +---+---------+
    |  1|     null|
    |  1|     null|
    | 10|     null|
    +---+---------+
    
    val s = ids.columns.
      map { c => 
        (c, ids.select(c).dropDuplicates(c).na.drop.count) }. // <-- performance here!
      collect { case (c, cnt) if cnt == 0 => c }
    scala> s.foreach(println)
    all_nulls
    
    0 讨论(0)
提交回复
热议问题