I have a Spark DataFrame df with five columns. I want to add another column with its values being the tuple of the first and second columns. When u
You can use a User-defined function udf to achieve what you want.
object TupleUDFs {
import org.apache.spark.sql.functions.udf
// type tag is required, as we have a generic udf
import scala.reflect.runtime.universe.{TypeTag, typeTag}
def toTuple2[S: TypeTag, T: TypeTag] =
udf[(S, T), S, T]((x: S, y: T) => (x, y))
}
df.withColumn(
"tuple_col", TupleUDFs.toTuple2[Int, Int].apply(df("a"), df("b"))
)
assuming "a" and "b" are the columns of type Int you want to put in a tuple.