How to find duplicates in a list?

后端 未结 5 1638
梦毁少年i
梦毁少年i 2020-12-09 15:25

I have a list of unsorted integers and I want to find those elements which have duplicates.

val dup = List(1,1,1,2,3,4,5,5,6,100,101,101,102         


        
5条回答
  •  -上瘾入骨i
    2020-12-09 15:44

    Summary: I've written a very efficient function which returns both List.distinct and a List consisting of each element which appeared more than once and the index at which the element duplicate appeared.

    Details: If you need a bit more information about the duplicates themselves, like I did, I have written a more general function which iterates across a List (as ordering was significant) exactly once and returns a Tuple2 consisting of the original List deduped (all duplicates after the first are removed; i.e. the same as invoking distinct) and a second List showing each duplicate and an Int index at which it occurred within the original List.

    I have implemented the function twice based on the general performance characteristics of the Scala collections; filterDupesL (where the L is for Linear) and filterDupesEc (where the Ec is for Effectively Constant).

    Here's the "Linear" function:

    def filterDupesL[A](items: List[A]): (List[A], List[(A, Int)]) = {
      def recursive(
          remaining: List[A]
        , index: Int =
            0
        , accumulator: (List[A], List[(A, Int)]) =
            (Nil, Nil)): (List[A], List[(A, Int)]
      ) =
        if (remaining.isEmpty)
          accumulator
        else
          recursive(
              remaining.tail
            , index + 1
            , if (accumulator._1.contains(remaining.head)) //contains is linear
              (accumulator._1, (remaining.head, index) :: accumulator._2)
            else
              (remaining.head :: accumulator._1, accumulator._2)
          )
      val (distinct, dupes) = recursive(items)
      (distinct.reverse, dupes.reverse)
    }
    

    An below is an example which might make it a bit more intuitive. Given this List of String values:

    val withDupes =
      List("a.b", "a.c", "b.a", "b.b", "a.c", "c.a", "a.c", "d.b", "a.b")
    

    ...and then performing the following:

    val (deduped, dupeAndIndexs) =
      filterDupesL(withDupes)
    

    ...the results are:

    deduped: List[String] = List(a.b, a.c, b.a, b.b, c.a, d.b)
    dupeAndIndexs: List[(String, Int)] = List((a.c,4), (a.c,6), (a.b,8))
    

    And if you just want the duplicates, you simply map across dupeAndIndexes and invoke distinct:

    val dupesOnly =
      dupeAndIndexs.map(_._1).distinct
    

    ...or all in a single call:

    val dupesOnly =
      filterDupesL(withDupes)._2.map(_._1).distinct
    

    ...or if a Set is preferred, skip distinct and invoke toSet...

    val dupesOnly2 =
      dupeAndIndexs.map(_._1).toSet
    

    ...or all in a single call:

    val dupesOnly2 =
      filterDupesL(withDupes)._2.map(_._1).toSet
    

    For very large Lists, consider using this more efficient version (which uses an additional Set to change the contains check in effectively constant time):

    Here's the "Effectively Constant" function:

    def filterDupesEc[A](items: List[A]): (List[A], List[(A, Int)]) = {
      def recursive(
          remaining: List[A]
        , index: Int =
            0
        , seenAs: Set[A] =
            Set()
        , accumulator: (List[A], List[(A, Int)]) =
            (Nil, Nil)): (List[A], List[(A, Int)]
      ) =
        if (remaining.isEmpty)
          accumulator
        else {
          val (isInSeenAs, seenAsNext) = {
            val isInSeenA =
              seenAs.contains(remaining.head) //contains is effectively constant
            (
                isInSeenA
              , if (!isInSeenA)
                  seenAs + remaining.head
                else
                  seenAs
            )
          }
          recursive(
              remaining.tail
            , index + 1
            , seenAsNext
            , if (isInSeenAs)
              (accumulator._1, (remaining.head, index) :: accumulator._2)
            else
              (remaining.head :: accumulator._1, accumulator._2)
          )
        }
      val (distinct, dupes) = recursive(items)
      (distinct.reverse, dupes.reverse)
    }
    

    Both of the above functions are adaptations of the filterDupes function in my open source Scala library, ScalaOlio. It's located at org.scalaolio.collection.immutable.List_._.

提交回复
热议问题