I have dataframe in which I have about 1000s ( variable) columns.
I want to make all values upper case.
Here is the approach I have thought of , can you sug
I needed to do similar but had to write my own function to convert empty strings within a dataframe to null. This is what I did.
import org.apache.spark.sql.functions.{col, udf}
import spark.implicits._
def emptyToNull(_str: String): Option[String] = {
_str match {
case d if (_str == null || _str.trim.isEmpty) => None
case _ => Some(_str)
}
}
val emptyToNullUdf = udf(emptyToNull(_: String))
val df = Seq(("a", "B", "c"), ("D", "e ", ""), ("", "", null)).toDF("x", "y", "z")
df.select(df.columns.map(c => emptyToNullUdf(col(c)).alias(c)): _*).show
+----+----+----+
| x| y| z|
+----+----+----+
| a| B| c|
| D| e |null|
|null|null|null|
+----+----+----+
Here's a more refined function of emptyToNull using options instead of null.
def emptyToNull(_str: String): Option[String] = Option(_str) match {
case ret @ Some(s) if (s.trim.nonEmpty) => ret
case _ => None
}