I have to remove duplicated objects in a List. It is a List from the object Blog that looks like this:
public class Blog {
private String title;
priv
Make sure Blog
has methods equals(Object)
and hashCode()
defined, and addAll(list)
then to a new HashSet()
, or new LinkedHashSet()
if the order is important.
Better yet, use a Set
instead of a List
from the start, since you obviously don't want duplicates, it's better that your data model reflects that rather than having to remove them after the fact.
hashCode()
and equals(..)
using those 4 fieldsnew HashSet<Blog>(blogList)
- this will give you a Set
which has no duplicates by definitionUpdate: Since you can't change the class, here's an O(n^2) solution:
You can make this more efficient if you provide a HashSet
data structure with externalized hashCode()
and equals(..)
methods.
First step you need is to implement the equals method and compare your fields. After that the steps vary.
You could create a new empty list and loop over the original, using: if(!list2.contains(item)) and then do an add.
Another quick way to do it, is to cram them all into a Set and pull them back into a List. This works because Sets do not allow duplicates to begin with.
I tried doing several ways for removing duplicates from a list of java objects
Some of them are
1. Override equals and hashCode methods and Converting the list to a set by passing the list to the set class constructor and do remove and add all
2. Run 2 pointers and remove the duplicates manually by running 2 for loops one inside the other like we used to do in C language for arrays
3.Write a anonymous Comparator class for the bean and do a Collections.sort and then run 2 pointers to remove in forward direction.
And more over my requirement was to remove almost 1 million duplicates from almost 5 million objects.
So after so many trials I ended up with third option which I feel is the most efficient and effective way and it turned out to be evaluating within seconds where as other 2 options are almost taking 10 to 15 mins.
First and Second options are very ineffective because when my objects increase the time taken to remove the duplicates increase in exponential way.
So Finally third option is the best.
First override equals()
method:
@Override
public boolean equals(Object obj)
{
if(obj == null) return false;
else if(obj instanceof MyObject && getTitle() == obj.getTitle() && getAuthor() == obj.getAuthor() && getURL() == obj.getURL() && getDescription() == obj.getDescription()) return true;
else return false;
}
and then use:
List<MyObject> list = new ArrayList<MyObject>;
for(MyObject obj1 : list)
{
for(MyObject obj2 : list)
{
if(obj1.equals(obj2)) list.remove(obj1); // or list.remove(obj2);
}
}
You could override the equals()
method, with title, author, url and description. (and the hashCode()
since if you override one you should override the other). Then use a HashSet
of type <blog>
.