Spark Select with a List of Columns Scala

喜欢而已 提交于 2021-01-21 04:22:35

问题


I am trying to find a good way of doing a spark select with a List[Column, I am exploding a column than passing back all the columns I am interested in with my exploded column.

var columns = getColumns(x) // Returns a List[Column]
tempDf.select(columns)   //trying to get

Trying to find a good way of doing this I know, if it were a string I could do something like

val result = dataframe.select(columnNames.head, columnNames.tail: _*)

回答1:


For spark 2.0 seems that you have two options. Both depends on how you manage your columns (Strings or Columns).

Spark code (spark-sql_2.11/org/apache/spark/sql/Dataset.scala):

def select(cols: Column*): DataFrame = withPlan {
  Project(cols.map(_.named), logicalPlan)
}

def select(col: String, cols: String*): DataFrame = select((col +: cols).map(Column(_)) : _*)

You can see how internally spark is converting your head & tail to a list of Columns to call again Select.

So, in that case if you want a clear code I will recommend:

If columns: List[String]:

import org.apache.spark.sql.functions.col
df.select(columns.map(col): _*)

Otherwise, if columns: List[Columns]:

df.select(columns: _*)


来源:https://stackoverflow.com/questions/39909863/spark-select-with-a-list-of-columns-scala

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!