How to find common elements among two array columns?

前端 未结 3 2065
执笔经年
执笔经年 2021-01-25 08:49

I have two comma-separated string columns (sourceAuthors and targetAuthors).

val df = Seq(
  (\"Author1,Author2,Author3\",\"Author2,Aut         


        
3条回答
  •  耶瑟儿~
    2021-01-25 09:42

    Unless I misunderstood your problem, there are standard functions that can help you (so you don't have to write a UDF), i.e. split and array_intersect.

    Given the following dataset:

    val df = Seq(("Author1,Author2,Author3","Author2,Author3"))
      .toDF("source","target")
    scala> df.show(false)
    +-----------------------+---------------+
    |source                 |target         |
    +-----------------------+---------------+
    |Author1,Author2,Author3|Author2,Author3|
    +-----------------------+---------------+
    

    You could write the following structured query:

    val intersect = array_intersect(split('source, ","), split('target, ","))
    val solution = df.select(intersect as "common_elements")
    scala> solution.show(false)
    +------------------+
    |common_elements   |
    +------------------+
    |[Author2, Author3]|
    +------------------+
    

提交回复
热议问题