I have to remove duplicated objects in a List. It is a List from the object Blog that looks like this:
public class Blog {
private String title;
priv
If for some reasons you don't want to override the equals
method and you want to remove duplicates based on multiple properties, then we can create a generic method to do that.
We can write it in 2 versions:
1. Modify the original list:
@SafeVarargs
public static <T> void removeDuplicatesFromList(List<T> list, Function<T, ?>... keyFunctions) {
Set<List<?>> set = new HashSet<>();
ListIterator<T> iter = list.listIterator();
while(iter.hasNext()) {
T element = iter.next();
List<?> functionResults = Arrays.stream(keyFunctions)
.map(function -> function.apply(element))
.collect(Collectors.toList());
if(!set.add(functionResults)) {
iter.remove();
}
}
}
2. Return a new list:
@SafeVarargs
public static <T> List<T> getListWithoutDuplicates(List<T> list, Function<T, ?>... keyFunctions) {
List<T> result = new ArrayList<>();
Set<List<?>> set = new HashSet<>();
for(T element : list) {
List<?> functionResults = Arrays.stream(keyFunctions)
.map(function -> function.apply(element))
.collect(Collectors.toList());
if(set.add(functionResults)) {
result.add(element);
}
}
return result;
}
In both cases we can consider any number of properties.
For example, to remove duplicates based on 4 properties title
, author
, url
and description
:
removeDuplicatesFromList(blogs, Blog::getTitle, Blog::getAuthor, Blog::getUrl, Blog::getDescription);
The methods work by leveraging the equals
method of List
, which will check the equality of its elements. In our case the elements of functionResults
are the values retrieved from the passed getters and we can use that list as an element of the Set
to check for duplicates.
Complete example:
public class Duplicates {
public static void main(String[] args) {
List<Blog> blogs = new ArrayList<>();
blogs.add(new Blog("a", "a", "a", "a"));
blogs.add(new Blog("b", "b", "b", "b"));
blogs.add(new Blog("a", "a", "a", "a")); // duplicate
blogs.add(new Blog("a", "a", "b", "b"));
blogs.add(new Blog("a", "b", "b", "b"));
blogs.add(new Blog("a", "a", "b", "b")); // duplicate
List<Blog> blogsWithoutDuplicates = getListWithoutDuplicates(blogs,
Blog::getTitle, Blog::getAuthor, Blog::getUrl, Blog::getDescription);
System.out.println(blogsWithoutDuplicates); // [a a a a, b b b b, a a b b, a b b b]
removeDuplicatesFromList(blogs,
Blog::getTitle, Blog::getAuthor, Blog::getUrl, Blog::getDescription);
System.out.println(blogs); // [a a a a, b b b b, a a b b, a b b b]
}
private static class Blog {
private String title;
private String author;
private String url;
private String description;
public Blog(String title, String author, String url, String description) {
this.title = title;
this.author = author;
this.url = url;
this.description = description;
}
public String getTitle() {
return title;
}
public String getAuthor() {
return author;
}
public String getUrl() {
return url;
}
public String getDescription() {
return description;
}
@Override
public String toString() {
return String.join(" ", title, author, url, description);
}
}
}
If you can't edit the source of the class (why not?), then you need to iterate over the list and compare each item based on the four criteria mentioned ("title, author, url and description").
To do this in a performant way, I would create a new class, something like BlogKey
which contains those four elements and which properly implements equals()
and hashCode()
. You can then iterate over the original list, constructing a BlogKey
for each and adding to a HashMap
:
Map<BlogKey, Blog> map = new HashMap<BlogKey, Blog>();
for (Blog blog : blogs) {
BlogKey key = createKey(blog);
if (!map.containsKey(key)) {
map.put(key, blog);
}
}
Collection<Blog> uniqueBlogs = map.values();
However the far simplest thing is to just edit the original source code of Blog
so that it correctly implements equals()
and hashCode()
.
You can use distinct to remove duplicates
List<Blog> blogList = ....// add your list here
blogList.stream().distinct().collect(Collectors.toList());