How to get duplicate items from a list using LINQ?

旧城冷巷雨未停 提交于 2019-11-26 17:10:31
Lee
var duplicates = lst.GroupBy(s => s)
    .SelectMany(grp => grp.Skip(1));

Note that this will return all duplicates, so if you only want to know which items are duplicated in the source list, you could apply Distinct to the resulting sequence or use the solution given by Mark Byers.

Here is one way to do it:

List<String> duplicates = lst.GroupBy(x => x)
                             .Where(g => g.Count() > 1)
                             .Select(g => g.Key)
                             .ToList();

The GroupBy groups the elements that are the same together, and the Where filters out those that only appear once, leaving you with only the duplicates.

Here's another option:

var list = new List<string> { "6", "1", "2", "4", "6", "5", "1" };

var set = new HashSet<string>();
var duplicates = list.Where(x => !set.Add(x));

I know it's not the answer to the original question, but you may find yourself here with this problem.

If you want all of the duplicate items in your results, the following works.

var duplicates = list
    .GroupBy( x => x )               // group matching items
    .Where( g => g.Skip(1).Any() )   // where the group contains more than one item
    .SelectMany( g => g );           // re-expand the groups with more than one item

In my situation I need all duplicates so that I can mark them in the UI as being errors.

Michael

I wrote this extension method based off @Lee's response to the OP. Note, a default parameter was used (requiring C# 4.0). However, an overloaded method call in C# 3.0 would suffice.

/// <summary>
/// Method that returns all the duplicates (distinct) in the collection.
/// </summary>
/// <typeparam name="T">The type of the collection.</typeparam>
/// <param name="source">The source collection to detect for duplicates</param>
/// <param name="distinct">Specify <b>true</b> to only return distinct elements.</param>
/// <returns>A distinct list of duplicates found in the source collection.</returns>
/// <remarks>This is an extension method to IEnumerable&lt;T&gt;</remarks>
public static IEnumerable<T> Duplicates<T>
         (this IEnumerable<T> source, bool distinct = true)
{
     if (source == null)
     {
        throw new ArgumentNullException("source");
     }

     // select the elements that are repeated
     IEnumerable<T> result = source.GroupBy(a => a).SelectMany(a => a.Skip(1));

     // distinct?
     if (distinct == true)
     {
        // deferred execution helps us here
        result = result.Distinct();
     }

     return result;
}

Hope this wil help

int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };

var duplicates = listOfItems 
    .GroupBy(i => i)
    .Where(g => g.Count() > 1)
    .Select(g => g.Key);

foreach (var d in duplicates)
    Console.WriteLine(d);
  List<String> list = new List<String> { "6", "1", "2", "4", "6", "5", "1" };

    var q = from s in list
            group s by s into g
            where g.Count() > 1
            select g.First();

    foreach (var item in q)
    {
        Console.WriteLine(item);

    }
Jamie L.

I was trying to solve the same with a list of objects and was having issues because I was trying to repack the list of groups into the original list. So I came up with looping through the groups to repack the original List with items that have duplicates.

public List<MediaFileInfo> GetDuplicatePictures()
{
    List<MediaFileInfo> dupes = new List<MediaFileInfo>();
    var grpDupes = from f in _fileRepo
                   group f by f.Length into grps
                   where grps.Count() >1
                   select grps;
    foreach (var item in grpDupes)
    {
        foreach (var thing in item)
        {
            dupes.Add(thing);
        }
    }
    return dupes;
}

All mentioned solutions until now perform a GroupBy. Even if I only need the first Duplicate all elements of the collections are enumerated at least once.

The following extension function stops enumerating as soon as a duplicate has been found. It continues if a next duplicate is requested.

As always in LINQ there are two versions, one with IEqualityComparer and one without it.

public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource> source)
{
    return source.ExtractDuplicates(null);
}
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource source,
    IEqualityComparer<TSource> comparer);
{
    if (source == null) throw new ArgumentNullException(nameof(source));
    if (comparer == null)
        comparer = EqualityCompare<TSource>.Default;

    HashSet<TSource> foundElements = new HashSet<TSource>(comparer);
    foreach (TSource sourceItem in source)
    {
        if (!foundElements.Contains(sourceItem))
        {   // we've not seen this sourceItem before. Add to the foundElements
            foundElements.Add(sourceItem);
        }
        else
        {   // we've seen this item before. It is a duplicate!
            yield return sourceItem;
        }
    }
}

Usage:

IEnumerable<MyClass> myObjects = ...

// check if has duplicates:
bool hasDuplicates = myObjects.ExtractDuplicates().Any();

// or find the first three duplicates:
IEnumerable<MyClass> first3Duplicates = myObjects.ExtractDuplicates().Take(3)

// or find the first 5 duplicates that have a Name = "MyName"
IEnumerable<MyClass> myNameDuplicates = myObjects.ExtractDuplicates()
    .Where(duplicate => duplicate.Name == "MyName")
    .Take(5);

For all these linq statements the collection is only parsed until the requested items are found. The rest of the sequence is not interpreted.

IMHO that is an efficiency boost to consider.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!