Comparing two lists and removing duplicates from one

落爺英雄遲暮 提交于 2019-12-19 08:43:09


I have an object called FormObject that contains two ArrayLists - oldBooks and newBooks - both of which contain Book objects.

oldBooks is allowed to contain duplicate Book objects newBooks is not allowed to contain duplicate Book objects within itself and cannot include any duplicates of Book objects in the oldBooks list.

The definition of a duplicate Book is complex and I can't override the equals method as the definition is not universal across all uses of the Book object.

I plan to have a method on the FormObject class called removeDuplicateNewBooks which will perform the above functionality.

How would you go about implementing this? My first thought was to use HashSets to eliminate the duplicates but not being able to override equals on the Book object means it won't work.


You can use a TreeSet with a custom Comparator<Book>:

  • construct the TreeSet with a Comparator implementing the custom logic you want
  • use set.addAll(bookList)

Now the Set contains only unique books.


For making the new books unique:

Create a wrapper class around Book and declare it's equals / hashCode methods based on the enclosed book object:

public class Wrapper{

    private final Book book;

    public Wrapper(final Book book){
        assert book != null; = book;

    public Book getBook(){

    public boolean equals(final Object other){
        return other instanceof Wrapper ? 
                ((Wrapper) other).getBookInfo()
            ) : false;

    public int hashCode(){
        return Arrays.hashCode(this.getBookInfo());

    private String[] getBookInfo(){
        return new String[] { 


EDIT: Optimized equals and hashCode and fixed a bug in hashCode.

Now use a set to remove duplicates:

Set<Wrapper> wrappers = new HashSet<Wrapper>();
for(Book book: newBooks){
    wrappers.add(new Wrapper(book);
for(Wrapper wrapper: wrappers){

(But of course the TreeSet answer with the custom comparator is more elegant because you can use the Book class itself)

EDIT: (removed reference to apache commons because my improved equals / hashCode methods are better)


HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.

public interface HashingStrategy<E>
    int computeHashCode(E object);
    boolean equals(E object1, E object2);

Eclipse Collections includes hash tables as well as iteration patterns based on hashing strategies. First, you'd create your own HashingStrategy to answer whether two Books are equal.

Next, you'd use distinct() to remove duplicates within newBooks and a UnifiedSetWithHashingStrategy to eliminate duplicates across the lists.

List<Book> oldBooks = ...;
List<Book> newBooks = ...;
HashingStrategy<Book> hashingStrategy = new HashingStrategy() { ... };
Set<Book> set = UnifiedSetWithHashingStrategy<>(hashingStrategy, oldBooks);
List<Book> result = ListIterate.distinct(newBooks, hashingStrategy).reject(set::contains);

The distinct() method returns only the unique items according to the hashing strategy. It returns a list, not a set, preserving the original order. The call to reject() returns another new list without the elements that the set contains, according to the same hashing strategy.

If you can change newBooks to implement an Eclipse Collections interface, then you can call the distinct() method directly.

MutableList<Book> newBooks = ...;
MutableList<Book> result = newBooks.distinct(hashingStrategy).reject(oldBooks::contains);

Note: I am a committer for Eclipse Collections.

