I want to filter a DataFrame
using a condition related to the length of a column, this question might be very easy but I didn\'t find any related question in th
@AlbertoBonsanto : below code filters based on array size:
val input = Seq(("a1,a2,a3,a4,a5"), ("a1,a2,a3,a4"), ("a1,a2,a3"), ("a1,a2"), ("a1"))
val df = sc.parallelize(input).toDF("tokens")
val tokensArrayDf = df.withColumn("tokens", split($"tokens", ","))
tokensArrayDf.show
+--------------------+
| tokens|
+--------------------+
|[a1, a2, a3, a4, a5]|
| [a1, a2, a3, a4]|
| [a1, a2, a3]|
| [a1, a2]|
| [a1]|
+--------------------+
tokensArrayDf.filter(size($"tokens") > 3).show
+--------------------+
| tokens|
+--------------------+
|[a1, a2, a3, a4, a5]|
| [a1, a2, a3, a4]|
+--------------------+