C# distinct List by substring

问题

I want to remove duplicates from a list of strings. I do this by using distinct, but i want to ignore the first char when comparing.

I already have a working code that deletes the duplicates, but my code also delete the first char of every string.

List<string> mylist = new List<string>();

List<string> newlist = 
  mylist.Select(e => e.Substring(1, e.Length - 1)).Distinct().ToList();

Input: "1A","1B","2A","3C","4D"

Output: "A","B","C","D"

Right Output: "1A","2B","3C","4D" it doesn't matter if "1A" or "2A" will be deleted

I guess I am pretty close but.... any input is highly appreciated!

As always a solution should work as fast as possible ;)

回答1:

You can GroupBy all but the first character and take the first of every group:

List<string> result= mylist.GroupBy(s => s.Length < 2 ? s : s.Substring(1))
                           .Select(g => g.First())
                           .ToList();

Result:

Console.Write(string.Join(",", result)); // 1A,1B,3C,4D

it doesn't matter if "1A" or "2A" will be deleted

If you change your mind you have to replace g.First() with the new logic.

However, if performance really matters and it is never important which duplicate you want to delete you should prefer Selman's approach which suggests to write a custom IEqualityComparer<string>. That will be more efficient than my GroupBy approach if it's GetHashCode is implemented like:

return (s.Length < 2 ? s : s.Substring(1)).GetHashCode();

回答2:

You can implement an IEqualityComparer<string> that will compare your strings by ignoring the first letter. Then pass it to Distinct method.

myList.Distinct(new MyComparer());

There is also an example on MSDN that shows you how to implement and use a custom comparer with Distinct.

回答3:

I'm going to suggest a simple extension that you can reuse in similar situations

public static IEnumerable<T> DistinctBy<T, U>(this IEnumerable<T> This, Func<T, U> keySelector)
{
    var set = new HashSet<U>();
    foreach (var item in This)
    {
        if (set.Add(keySelector(item)))
            yield return item;
    }
}

This is basically how Distinct is implemented in Linq.

Usage:

List<string> newlist = 
  mylist.DistinctBy(e => e.Substring(1, e.Length - 1)).ToList();

回答4:

I realise the answer has already been given, but since I was working on this answer anyway I'm still going to post it, in case it's any use.

If you really want the fastest solution for large lists, then something like this might be optimal. You would need to do some accurate timings to be sure, though!

This approach does not make any additional string copies when comparing or computing the hash codes:

using System;
using System.Collections.Generic;
using System.Linq;

namespace Demo
{
    internal static class Program
    {
        static void Main()
        {
            var myList = new List<string>
            {
                "1A",
                "1B",
                "2A",
                "3C",
                "4D"
            };

            var newList = myList.Distinct(new MyComparer());
            Console.WriteLine(string.Join("\n", newList));
        }

        sealed class MyComparer: IEqualityComparer<string>
        {
            public bool Equals(string x, string y)
            {
                if (x.Length != y.Length)
                    return false;

                if (x.Length == 0)
                    return true;

                return (string.Compare(x, 1, y, 1, x.Length) == 0);
            }

            public int GetHashCode(string s)
            {
                if (s.Length <= 1)
                    return 0;

                int result = 17;

                unchecked
                {
                    bool first = true;

                    foreach (char c in s)
                    {
                        if (first)
                            first = false;
                        else
                            result = result*23 + c;
                    }
                }

                return result;
            }
        }
    }
}

来源：https://stackoverflow.com/questions/25421170/c-sharp-distinct-liststring-by-substring

标签

list

distinct

C# distinct List<string> by substring

问题

回答1:

回答2:

回答3:

回答4: