问题
I've got a HashSet,
var universe = new HashSet<int>();
And a bunch of subsets,
var sets = new List<HashSet<int>>(numSets);
I want to subtract a chunk, which I can do like this:
var remaining = universe.ExceptWith(sets[0]);
But ExceptWith
works in-place. I don't want to modify the universe
. Should I clone it first, or is there a better way?
回答1:
I guess I should clone it first? How do I do that?
var universe = new HashSet<int>();
var subset = new HashSet<int>();
...
// clone the universe
var remaining = new HashSet<int>(universe);
remaining.ExceptWith(subset);
Not as simple as with the Except
extension method, but probably faster (you should run a few performance tests to make sure)
回答2:
How about Except()
?
var x = new HashSet<int>();
var y = new HashSet<int>();
var xminusy = new HashSet<int>(x.Except(y));
回答3:
I benchmarked Linq's Except
method against cloning and using the HashSet-native function ExceptWith
. Here are the results.
static class Program
{
public static HashSet<T> ToSet<T>(this IEnumerable<T> collection)
{
return new HashSet<T>(collection);
}
public static HashSet<T> Subtract<T>(this HashSet<T> set, IEnumerable<T> other)
{
var clone = set.ToSet();
clone.ExceptWith(other);
return clone;
}
static void Main(string[] args)
{
var A = new HashSet<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var B = new HashSet<int> { 2, 4, 6, 8, 10 };
var sw = new Stopwatch();
sw.Restart();
for (int i = 0; i < 1000000; ++i)
{
var C = A.Except(B).ToSet();
}
sw.Stop();
Console.WriteLine("Linq: {0} ms", sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 0; i < 1000000; ++i)
{
var C = A.Subtract(B);
}
sw.Stop();
Console.WriteLine("Native: {0} ms", sw.ElapsedMilliseconds);
Console.ReadLine();
}
}
Linq: 1297 ms
Native: 762 ms
http://programanddesign.com/cs/subtracting-sets/
回答4:
A hash set has to track its hash algorithm constants, and its overflow bins. The elements in the set are held by reference. Creating a new hash with the copy constructor, as Thomas Levesque suggests, creates a shallow copy of this overhead and should be quite fast. Using Except() in the way that James McNellis suggests first creates an anonymous copy and then passes that to the copy constructor which uses the fields in the anonymous to initialize its own fields. As Thomas said, you might do a few performance tests, but theoretically his answer should beat James' answer. And by the way, to my way of thinking, a shallow copy is not a clone since I believe a clone implies that the underlying elements are also copied. Hash sets with common elements use a copy when modified strategy.
回答5:
Very late answer but maybe useful sometimes.
@mpen answered by using Linq's Except(IEnumerable<>)
Which make linq loop trough IEnumerable check if it's contains.
How about
setA.Where(i => !setB.Contains(i))
回答6:
Obviously in a few cases 'manually' adding items in a loop is more efficient than copying the whole set and then removing items. One I can think of ...
// no more set ops planned? then returning list is an option
public static List<T> ExceptWith<T>(HashSet<T> allObjects, Hashset<T> minus)
{
// Set Capacity of list (allObjects.Count-minus.Count?)
List<T> retlst = new List<T>(allObjects.Count);
foreach( var obj in allObjects) {
if( minus.Contains(obj)==false)
retlst.Add(obj);
}
return retlst;
}
// Special case where quantity of copying will be high
// more expensive in fact than just adding
public static HashSet<T> ExceptWith<T>(HashSet<T> allObjects, HashSet<T> minus)
{
if( minus.Count > allObjects.Count * 7/8 )
{
HashSet<T> retHash = new HashSet<T>();
foreach( var obj in allObjects) {
if( minus.Contains(obj)==false)
retHash.Add(obj);
}
return retHash;
}
else
{
// usual clone and remove
}
}
来源:https://stackoverflow.com/questions/3897568/subtract-hashsets-and-return-a-copy