I was wondering if it is possible to change the position of a column in a dataframe, actually to change the schema?
Precisely if I have got a dataframe like [f
The spark-daria library has a reorderColumns method that makes it easy to reorder the columns in a DataFrame.
import com.github.mrpowers.spark.daria.sql.DataFrameExt._
val actualDF = sourceDF.reorderColumns(
Seq("field1", "field3", "field2")
)
The reorderColumns method uses @Rockie Yang's solution under the hood.
If you want to get the column ordering of df1 to equal the column ordering of df2, something like this should work better than hardcoding all the columns:
df1.reorderColumns(df2.columns)
The spark-daria library also defines a sortColumns transformation to sort columns in ascending or descending order (if you don't want to specify all the column in a sequence).
import com.github.mrpowers.spark.daria.sql.transformations._
df.transform(sortColumns("asc"))