Filtering rows based on column values in spark dataframe scala

前端 未结 4 730
时光说笑
时光说笑 2020-12-09 21:16

I have a dataframe(spark):

id  value 
3     0
3     1
3     0
4     1
4     0
4     0

I want to create a new dataframe:

3 0         


        
4条回答
  •  盖世英雄少女心
    2020-12-09 21:44

    use isin method and filter as below:
    
    val data = Seq((3,0,2),(3,1,3),(3,0,1),(4,1,6),(4,0,5),(4,0,4),(1,0,7),(1,1,8),(1,0,9),(2,1,10),(2,0,11),(2,0,12)).toDF("id", "value","sorted")
    val idFilter = List(1, 2)
     data.filter($"id".isin(idFilter:_*)).show
    +---+-----+------+
    | id|value|sorted|
    +---+-----+------+
    |  1|    0|     7|
    |  1|    1|     8|
    |  1|    0|     9|
    |  2|    1|    10|
    |  2|    0|    11|
    |  2|    0|    12|
    +---+-----+------+
    
    Ex: filter based on val
    val valFilter = List(0)
    data.filter($"value".isin(valFilter:_*)).show
    +---+-----+------+
    | id|value|sorted|
    +---+-----+------+
    |  3|    0|     2|
    |  3|    0|     1|
    |  4|    0|     5|
    |  4|    0|     4|
    |  1|    0|     7|
    |  1|    0|     9|
    |  2|    0|    11|
    |  2|    0|    12|
    +---+-----+------+
    

提交回复
热议问题