问题
I have one dataset and I want to transpose the columns (dynamic number of columns) into two rows always using Spark and Java.
Sample input:
+-------+-------+---------+
|titanic|IronMan|Juglebook|
+-------+-------+---------+
| 101| test1| 10|
| 102| test2| 20|
| 103| test3| 30|
+-------+-------+---------+
Sample Output:
| Colname|colvalue
+---------+----+----+---------+
| titanic| 101,102,103 |
| IronMan | test1,test2,test3|
|Juglebook | 10,20,30 |
+-------+-------+-------------+
I tried with spark sql but it's becoming hardcoded.
回答1:
Considering your request for transposing the columns to rows one issue you might face is that your values needs to be in string and not in Int. first you need to cast all of your values to string. assuming that part is done here is how you can trnapose and use struct to get to what you want
Below is a Scala implementation of it
Import org.apache.spark.sql.funtions._
def transpose(transDF:DataFrame) :DataFrame ={
cols1= transDF.dtypes.unzip
cols2= cols1._1
val KVS = explode(
array(cols2.map(c =>struct(lit(c).alias("column_name"), col(c).alias("column_Value"))
):_*))
transDF.Select(kvs.alias("_kvs"))
}
You can call the function from your main this will return the transposed columns. Then you can just use groupBy and Agg to get the data in your desired format.
来源:https://stackoverflow.com/questions/55628129/how-to-get-transpose-of-dynamic-dataset-for-below-sample-input-using-spark-and-j