问题
For this given input data;
|BASE_CAP_RET|BASE_INC_RET|BASE_TOT_RET|acct_cd|eff_date|id |
+------------+------------+------------+----------+-------------------+--------+
|0.1 |0.2 |0.1 |acc1|2004-01-01T00:00:00|10018069|
|0.2 |0.2 |0.1|acc1|2004-01-01T00:00:00|10018069|
|0.3 |0.2 |0.1 |acc1|2004-01-02T00:00:00|10018069|
How do i multiply all rows for the column
BASE_CAP_RET
,BASE_INC_RET
andBASE_TOT_RET
?
|BASE_CAP_RET|BASE_INC_RET|BASE_TOT_RET|acct_cd|eff_date|id |
+------------+------------+------------+----------+------------
|0.6 |0.8 |0.1 |acc1|2004-01-01T00:00:00|10018069|
i am able to do this in scala using -
scala> var sec_returns = returns.withColumn("returns",explode($"returns")).select("returns.*")
scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._
scala> val prod = udf((vals:Seq[Double]) => vals.reduce(_*_))
scala> var ret_columns = sec_returns.columns.filter(col => col.contains("RET")).map(p => prod(collect_list(p)))
scala> sec_returns.groupBy(col("acct_cd"),col("sec_id")).agg(ret_columns.head, ret_columns.tail: _*).show()
how do i do it in java?
来源:https://stackoverflow.com/questions/58632733/java-spark-multiply-all-rows-for-column