C# distinct List<string> by substring

筅森魡賤 提交于 2019-12-06 10:41:23

问题


I want to remove duplicates from a list of strings. I do this by using distinct, but i want to ignore the first char when comparing.

I already have a working code that deletes the duplicates, but my code also delete the first char of every string.

List<string> mylist = new List<string>();

List<string> newlist = 
  mylist.Select(e => e.Substring(1, e.Length - 1)).Distinct().ToList();

Input: "1A","1B","2A","3C","4D"

Output: "A","B","C","D"

Right Output: "1A","2B","3C","4D" it doesn't matter if "1A" or "2A" will be deleted

I guess I am pretty close but.... any input is highly appreciated!

As always a solution should work as fast as possible ;)


回答1:


You can GroupBy all but the first character and take the first of every group:

List<string> result= mylist.GroupBy(s => s.Length < 2 ? s : s.Substring(1))
                           .Select(g => g.First())
                           .ToList();

Result:

Console.Write(string.Join(",", result)); // 1A,1B,3C,4D

it doesn't matter if "1A" or "2A" will be deleted

If you change your mind you have to replace g.First() with the new logic.

However, if performance really matters and it is never important which duplicate you want to delete you should prefer Selman's approach which suggests to write a custom IEqualityComparer<string>. That will be more efficient than my GroupBy approach if it's GetHashCode is implemented like:

return (s.Length < 2 ? s : s.Substring(1)).GetHashCode();



回答2:


You can implement an IEqualityComparer<string> that will compare your strings by ignoring the first letter. Then pass it to Distinct method.

myList.Distinct(new MyComparer());

There is also an example on MSDN that shows you how to implement and use a custom comparer with Distinct.




回答3:


I'm going to suggest a simple extension that you can reuse in similar situations

public static IEnumerable<T> DistinctBy<T, U>(this IEnumerable<T> This, Func<T, U> keySelector)
{
    var set = new HashSet<U>();
    foreach (var item in This)
    {
        if (set.Add(keySelector(item)))
            yield return item;
    }
}

This is basically how Distinct is implemented in Linq.

Usage:

List<string> newlist = 
  mylist.DistinctBy(e => e.Substring(1, e.Length - 1)).ToList();



回答4:


I realise the answer has already been given, but since I was working on this answer anyway I'm still going to post it, in case it's any use.

If you really want the fastest solution for large lists, then something like this might be optimal. You would need to do some accurate timings to be sure, though!

This approach does not make any additional string copies when comparing or computing the hash codes:

using System;
using System.Collections.Generic;
using System.Linq;

namespace Demo
{
    internal static class Program
    {
        static void Main()
        {
            var myList = new List<string>
            {
                "1A",
                "1B",
                "2A",
                "3C",
                "4D"
            };

            var newList = myList.Distinct(new MyComparer());
            Console.WriteLine(string.Join("\n", newList));
        }

        sealed class MyComparer: IEqualityComparer<string>
        {
            public bool Equals(string x, string y)
            {
                if (x.Length != y.Length)
                    return false;

                if (x.Length == 0)
                    return true;

                return (string.Compare(x, 1, y, 1, x.Length) == 0);
            }

            public int GetHashCode(string s)
            {
                if (s.Length <= 1)
                    return 0;

                int result = 17;

                unchecked
                {
                    bool first = true;

                    foreach (char c in s)
                    {
                        if (first)
                            first = false;
                        else
                            result = result*23 + c;
                    }
                }

                return result;
            }
        }
    }
}


来源:https://stackoverflow.com/questions/25421170/c-sharp-distinct-liststring-by-substring

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!