Apply same function to all fields of spark dataframe row

前端 未结 2 1446
礼貌的吻别
礼貌的吻别 2020-12-09 11:16

I have dataframe in which I have about 1000s ( variable) columns.

I want to make all values upper case.

Here is the approach I have thought of , can you sug

相关标签:
2条回答
  • 2020-12-09 12:10

    If you simply want to apply the same functions to all columns something like this should be enough:

    import org.apache.spark.sql.functions.{col, upper}
    
    val df = sc.parallelize(
      Seq(("a", "B", "c"), ("D", "e", "F"))).toDF("x", "y", "z")
    df.select(df.columns.map(c => upper(col(c)).alias(c)): _*).show
    
    // +---+---+---+
    // |  x|  y|  z|
    // +---+---+---+
    // |  A|  B|  C|
    // |  D|  E|  F|
    // +---+---+---+
    

    or in Python

    from pyspark.sql.functions import col, upper
    
    df = sc.parallelize([("a", "B", "c"), ("D", "e", "F")]).toDF(("x", "y", "z"))
    df.select(*(upper(col(c)).alias(c) for c in df.columns)).show()
    
    ##  +---+---+---+
    ##  |  x|  y|  z|
    ##  +---+---+---+
    ##  |  A|  B|  C|
    ##  |  D|  E|  F|
    ##  +---+---+---+
    

    See also: SparkSQL: apply aggregate functions to a list of column

    0 讨论(0)
  • 2020-12-09 12:12

    I needed to do similar but had to write my own function to convert empty strings within a dataframe to null. This is what I did.

    import org.apache.spark.sql.functions.{col, udf} 
    import spark.implicits._ 
    
    def emptyToNull(_str: String): Option[String] = {
      _str match {
        case d if (_str == null || _str.trim.isEmpty) => None
        case _ => Some(_str)
      }
    }
    val emptyToNullUdf = udf(emptyToNull(_: String))
    
    val df = Seq(("a", "B", "c"), ("D", "e ", ""), ("", "", null)).toDF("x", "y", "z")
    df.select(df.columns.map(c => emptyToNullUdf(col(c)).alias(c)): _*).show
    
    +----+----+----+
    |   x|   y|   z|
    +----+----+----+
    |   a|   B|   c|
    |   D|  e |null|
    |null|null|null|
    +----+----+----+
    

    Here's a more refined function of emptyToNull using options instead of null.

    def emptyToNull(_str: String): Option[String] = Option(_str) match {
      case ret @ Some(s) if (s.trim.nonEmpty) => ret
      case _ => None
    }
    
    0 讨论(0)
提交回复
热议问题