问题
I have an text document of emails such as
Google12@gmail.com,
MyUSERNAME@me.com,
ME@you.com,
ratonabat@co.co,
iamcool@asd.com,
ratonabat@co.co,
I need to check said document for duplicates and create a unique array from that (so if "ratonabat@co.co" appears 500 times in the new array he'll only appear once.)
Edit: For an example:
username1@hotmail.com
username2@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1@hotmail.com
This is my "data" (either in an array or text document, I can handle that)
I want to be able to see if there's a duplicate in that, and move the duplicate ONCE to another array. So the output would be
username1@hotmail.com
回答1:
You can simply use Linq's Distinct extension method:
var input = new string[] { ... };
var output = input.Distinct().ToArray();
You may also want to consider refactoring your code to use a HashSet<string> instead of a simple array, as it will gracefully handle duplicates.
To get an array containing only those records which are duplicates, it's a little moe complex, but you can still do it with a little Linq:
var output = input.GroupBy(x => x)
.Where(g => g.Skip(1).Any())
.Select(g => g.Key)
.ToArray();
Explanation:
.GroupBy
group identical strings together.Where
filter the groups by the following criteria.Skip(1).Any()
return true if there are 2 or more items in the group. This is equivalent to.Count() > 1
, but it's slightly more efficient because it stops counting after it finds a second item.
.Select
return a set consisting only of a single string (rather than the group).ToArray
convert the result set to an array.
Here's another solution using a custom extension method:
public static class MyExtensions
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
{
var a = new HashSet<T>();
var b = new HashSet<T>();
foreach(var x in input)
{
if (!a.Add(x) && b.Add(x))
yield return x;
}
}
}
And then you can call this method like this:
var output = input.Duplicates().ToArray();
I haven't benchmarked this, but it should be more efficient than the previous method.
回答2:
You can use the built in in .Distinct() method, by default the comparisons are case sensitive, if you want to make it case insenstive use the overload that takes a comparer in and use a case insensitive string comparer.
List<string> emailAddresses = GetListOfEmailAddresses();
string[] uniqueEmailAddresses = emailAddresses.Distinct(StringComparer.OrdinalIgnoreCase).ToArray();
EDIT: Now I see after you made your clarification you only want to list the duplicates.
string[] duplicateAddresses = emailAddresses.GroupBy(address => address,
(key, rows) => new {Key = key, Count = rows.Count()},
StringComparer.OrdinalIgnoreCase)
.Where(row => row.Count > 1)
.Select(row => row.Key)
.ToArray();
回答3:
To select emails which occur more then once..
var dupEmails=from emails in File.ReadAllText(path).Split(',').GroupBy(x=>x)
where emails.Count()>1
select emails.Key;
来源:https://stackoverflow.com/questions/19852273/check-array-for-duplicates-return-only-items-which-appear-more-than-once