I\'ve got a List
of Persons and I want to find duplicate entries, consindering all fields except id
. So using the equals()
-method (and in
I would advise against using a Comparator
to do this. It is quite difficult to write a legal compare()
method based on the other fields.
I think a better solution would be to create a class PersonWithoutId
like so:
public PersonWithoutId {
private String firstname, lastname;
private int age;
// no id field
public PersonWithoutId(Person original) { /* copy fields from Person */ }
@Overrides public boolean equals() { /* compare these 3 fields */ }
@Overrides public int hashCode() { /* hash these 3 fields */ }
}
Then, given a List
called people
you can do this:
Set set = new HashSet<>();
for (Iterator i = people.iterator(); i.hasNext();)
if (!set.add(new PersonWithoutId(i.next())))
i.remove();
Edit
As others have pointed out in the comments, this solution is not ideal as it creates a load of objects for the garbage collector to deal with. But this solution is much faster than a solution using a Comparator
and a TreeSet
. Keeping a Set
in order takes time and it has nothing to do with the original problem. I tested this on List
s of 1,000,000 instances of Person
constructed using
new Person(
"" + rand.nextInt(500), // firstname
"" + rand.nextInt(500), // lastname
rand.nextInt(100), // age
rand.nextLong()) // id
and found this solution to be roughly twice as fast as a solution using a TreeSet
. (Admittedly I used System.nanoTime()
rather than proper benchmarking).
So how can you do this efficiently without creating loads of unnecessary objects? Java doesn't make it easy. One way would be to write two new methods in Person
boolean equalsIgnoringId(Person other) { ... }
int hashCodeIgnoringId() { ... }
and then to write a custom implementation of Set
where you basically cut and paste the code for HashSet
except you replace equals()
and hashCode()
by equalsIgnoringId()
and hashCodeIgnoringId()
.
In my humble opinion, the fact that you can create a TreeSet
that uses a Comparator
, but not a HashSet
that uses custom versions of equals
/hashCode
is a serious flaw in the language.