问题
How to lower the case of column names of a data frame but not its values? using RAW Spark SQL and Dataframe methods ?
Input data frame (Imagine I have 100's of these columns in uppercase)
NAME | COUNTRY | SRC | CITY | DEBIT
---------------------------------------------
"foo"| "NZ" | salary | "Auckland" | 15.0
"bar"| "Aus" | investment | "Melbourne"| 12.5
taget dataframe
name | country | src | city | debit
------------------------------------------------
"foo"| "NZ" | salary | "Auckland" | 15.0
"bar"| "Aus" | investment | "Melbourne"| 12.5
回答1:
Java 8
solution to convert the column names to lower case.
import static org.apache.spark.sql.functions.col;
import org.apache.spark.sql.Column;
df.select(Arrays.asList(df.columns()).stream().map(x -> col(x).as(x.toLowerCase())).toArray(size -> new Column[size])).show(false);
回答2:
If you are using scala, you can simply do the following
import org.apache.spark.sql.functions._
df.select(df.columns.map(x => col(x).as(x.toLowerCase)): _*).show(false)
And if you are using pyspark, you can simply do the following
from pyspark.sql import functions as F
df.select([F.col(x).alias(x.lower()) for x in df.columns]).show()
回答3:
How about this:
Some fake data:
scala> val df = spark.sql("select 'A' as AA, 'B' as BB")
df: org.apache.spark.sql.DataFrame = [AA: string, BB: string]
scala> df.show()
+---+---+
| AA| BB|
+---+---+
| A| B|
+---+---+
Now re-select all columns with a new name, which is just their lower-case version:
scala> val cols = df.columns.map(c => s"$c as ${c.toLowerCase}")
cols: Array[String] = Array(AA as aa, BB as bb)
scala> val lowerDf = df.selectExpr(cols:_*)
lowerDf: org.apache.spark.sql.DataFrame = [aa: string, bb: string]
scala> lowerDf.show()
+---+---+
| aa| bb|
+---+---+
| A| B|
+---+---+
Note: I use Scala. If you use PySpark and are not familiar with the Scala syntax, then df.columns.map(c => s"$c as ${c.toLowerCase}")
is map(lambda c: c.lower(), df.columns)
in Python and cols:_*
becomes *cols
. Please note I didn't run this translation.
回答4:
You can use df.withColumnRenamed(col_name,col_name.lower()) for spark dataframe in python
来源:https://stackoverflow.com/questions/48675036/how-to-lower-the-case-of-column-names-of-a-data-frame-but-not-its-values