I\'ve got a List of Persons and I want to find duplicate entries, consindering all fields except id. So using the equals()-method (and in
I would advise against using a Comparator to do this. It is quite difficult to write a legal compare() method based on the other fields.
I think a better solution would be to create a class PersonWithoutId like so:
public PersonWithoutId {
private String firstname, lastname;
private int age;
// no id field
public PersonWithoutId(Person original) { /* copy fields from Person */ }
@Overrides public boolean equals() { /* compare these 3 fields */ }
@Overrides public int hashCode() { /* hash these 3 fields */ }
}
Then, given a List called people you can do this:
Set set = new HashSet<>();
for (Iterator i = people.iterator(); i.hasNext();)
if (!set.add(new PersonWithoutId(i.next())))
i.remove();
Edit
As others have pointed out in the comments, this solution is not ideal as it creates a load of objects for the garbage collector to deal with. But this solution is much faster than a solution using a Comparator and a TreeSet. Keeping a Set in order takes time and it has nothing to do with the original problem. I tested this on Lists of 1,000,000 instances of Person constructed using
new Person(
"" + rand.nextInt(500), // firstname
"" + rand.nextInt(500), // lastname
rand.nextInt(100), // age
rand.nextLong()) // id
and found this solution to be roughly twice as fast as a solution using a TreeSet. (Admittedly I used System.nanoTime() rather than proper benchmarking).
So how can you do this efficiently without creating loads of unnecessary objects? Java doesn't make it easy. One way would be to write two new methods in Person
boolean equalsIgnoringId(Person other) { ... }
int hashCodeIgnoringId() { ... }
and then to write a custom implementation of Set where you basically cut and paste the code for HashSet except you replace equals() and hashCode() by equalsIgnoringId() and hashCodeIgnoringId() .
In my humble opinion, the fact that you can create a TreeSet that uses a Comparator, but not a HashSet that uses custom versions of equals/hashCode is a serious flaw in the language.