I want to add a column in a DataFrame
with some arbitrary value (that is the same for each row). I get an error when I use withColumn
as follows:>
In spark 2.2 there are two ways to add constant value in a column in DataFrame:
1) Using lit
2) Using typedLit
.
The difference between the two is that typedLit
can also handle parameterized scala types e.g. List, Seq, and Map
Sample DataFrame:
val df = spark.createDataFrame(Seq((0,"a"),(1,"b"),(2,"c"))).toDF("id", "col1")
+---+----+
| id|col1|
+---+----+
| 0| a|
| 1| b|
+---+----+
1) Using lit
: Adding constant string value in new column named newcol:
import org.apache.spark.sql.functions.lit
val newdf = df.withColumn("newcol",lit("myval"))
Result:
+---+----+------+
| id|col1|newcol|
+---+----+------+
| 0| a| myval|
| 1| b| myval|
+---+----+------+
2) Using typedLit
:
import org.apache.spark.sql.functions.typedLit
df.withColumn("newcol", typedLit(("sample", 10, .044)))
Result:
+---+----+-----------------+
| id|col1| newcol|
+---+----+-----------------+
| 0| a|[sample,10,0.044]|
| 1| b|[sample,10,0.044]|
| 2| c|[sample,10,0.044]|
+---+----+-----------------+