How to get transpose of dynamic dataset for below sample input using Spark and Java

倾然丶 夕夏残阳落幕 提交于 2019-12-08 13:54:09

问题


I have one dataset and I want to transpose the columns (dynamic number of columns) into two rows always using Spark and Java.

Sample input:

+-------+-------+---------+
|titanic|IronMan|Juglebook|
+-------+-------+---------+
|    101|  test1|       10|
|    102|  test2|       20|
|    103|  test3|       30|
+-------+-------+---------+

Sample Output:

|    Colname|colvalue       
+---------+----+----+---------+     
|   titanic| 101,102,103      |     
|  IronMan | test1,test2,test3|     
|Juglebook |  10,20,30        |     
+-------+-------+-------------+

I tried with spark sql but it's becoming hardcoded.


回答1:


Considering your request for transposing the columns to rows one issue you might face is that your values needs to be in string and not in Int. first you need to cast all of your values to string. assuming that part is done here is how you can trnapose and use struct to get to what you want

Below is a Scala implementation of it

 Import org.apache.spark.sql.funtions._
def transpose(transDF:DataFrame) :DataFrame ={
cols1= transDF.dtypes.unzip
cols2= cols1._1
val KVS = explode(
array(cols2.map(c =>struct(lit(c).alias("column_name"), col(c).alias("column_Value"))
):_*))
transDF.Select(kvs.alias("_kvs"))
}

You can call the function from your main this will return the transposed columns. Then you can just use groupBy and Agg to get the data in your desired format.



来源:https://stackoverflow.com/questions/55628129/how-to-get-transpose-of-dynamic-dataset-for-below-sample-input-using-spark-and-j

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!