Finding duplicates in a List ignoring a field

前端 未结 4 720
我寻月下人不归
我寻月下人不归 2021-01-22 12:38

I\'ve got a List of Persons and I want to find duplicate entries, consindering all fields except id. So using the equals()-method (and in

4条回答
  •  萌比男神i
    2021-01-22 13:07

    I would advise against using a Comparator to do this. It is quite difficult to write a legal compare() method based on the other fields.

    I think a better solution would be to create a class PersonWithoutId like so:

    public PersonWithoutId {
      private String firstname, lastname;
      private int age;
      // no id field
      public PersonWithoutId(Person original) { /* copy fields from Person */ }
      @Overrides public boolean equals() { /* compare these 3 fields */ }
      @Overrides public int hashCode() { /* hash these 3 fields */ }
    }
    

    Then, given a List called people you can do this:

    Set set = new HashSet<>();
    for (Iterator i = people.iterator(); i.hasNext();) 
        if (!set.add(new PersonWithoutId(i.next())))
            i.remove();
    

    Edit

    As others have pointed out in the comments, this solution is not ideal as it creates a load of objects for the garbage collector to deal with. But this solution is much faster than a solution using a Comparator and a TreeSet. Keeping a Set in order takes time and it has nothing to do with the original problem. I tested this on Lists of 1,000,000 instances of Person constructed using

    new Person(
        "" + rand.nextInt(500),  // firstname 
        "" + rand.nextInt(500),  // lastname
        rand.nextInt(100),       // age
        rand.nextLong())         // id
    

    and found this solution to be roughly twice as fast as a solution using a TreeSet. (Admittedly I used System.nanoTime() rather than proper benchmarking).

    So how can you do this efficiently without creating loads of unnecessary objects? Java doesn't make it easy. One way would be to write two new methods in Person

    boolean equalsIgnoringId(Person other) { ... }
    
    int hashCodeIgnoringId() { ... }
    

    and then to write a custom implementation of Set where you basically cut and paste the code for HashSet except you replace equals() and hashCode() by equalsIgnoringId() and hashCodeIgnoringId() .

    In my humble opinion, the fact that you can create a TreeSet that uses a Comparator, but not a HashSet that uses custom versions of equals/hashCode is a serious flaw in the language.

提交回复
热议问题