Filtering DataFrame using the length of a column

前端 未结 3 1435
耶瑟儿~
耶瑟儿~ 2020-12-02 22:28

I want to filter a DataFrame using a condition related to the length of a column, this question might be very easy but I didn\'t find any related question in th

3条回答
  •  失恋的感觉
    2020-12-02 22:50

    @AlbertoBonsanto : below code filters based on array size:

    val input = Seq(("a1,a2,a3,a4,a5"), ("a1,a2,a3,a4"), ("a1,a2,a3"), ("a1,a2"), ("a1"))
    val df = sc.parallelize(input).toDF("tokens")
    val tokensArrayDf = df.withColumn("tokens", split($"tokens", ","))
    tokensArrayDf.show
    +--------------------+
    |              tokens|
    +--------------------+
    |[a1, a2, a3, a4, a5]|
    |    [a1, a2, a3, a4]|
    |        [a1, a2, a3]|
    |            [a1, a2]|
    |                [a1]|
    +--------------------+
    
    tokensArrayDf.filter(size($"tokens") > 3).show
    +--------------------+
    |              tokens|
    +--------------------+
    |[a1, a2, a3, a4, a5]|
    |    [a1, a2, a3, a4]|
    +--------------------+
    

提交回复
热议问题