问题
As part of this question, it was pointed out repeatedly that I had an O(n^2) problem using code similar to this...
public class Foo
{
public string IdentityValue {get;set;}
public string Prop1 {get;set;}
public string Prop2 {get;set;}
}
List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
List<Foo> itemSet2 = GenerateLargeItemSet();
foreach (var itemFromSet1 in itemSet1)
{
//does a corresponding item exist in itemSet2?
var itemSet2Item = itemSet2.FirstOrDefault(i => i.IdentityValue == itemFromSet1.IdentityValue);
if (itemSet2Item != null)
{
//do stuff to create item in the persistent store
}
else
{
//do stuff to update item in the persistent store
}
}
Excusing the string comparison and parallelization considerations, is there a cheap and generic (objects may be type T, and the Identity property may be something else) way to reduce the O(n^2) nature of this?
回答1:
One of solutions is to use Enumerable.Join method which have complextiy O(n)
List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
List<Foo> itemSet2 = GenerateLargeItemSet();
// O(n)
var joinedSet = itemSet1.Join(itemSet2, s1 => s1.IdentityValue, s2 => s2.IdentityValue, (f1, f2) => f1).ToList();
// O(n)
foreach (var joinedItem in joinedSet)
{
//do stuff to create item in the persistent store
}
// O(n)
var unjoinedSet = itemSet1.Except(joinedSet);
// O(n)
foreach (var unjoinedItem in unjoinedSet)
{
//do stuff to update item in the persistent store
}
回答2:
A well known way of improving the database queries speed is creating an index. The same principle can be applied here. But what is an index? It's just a data structure that allows fast searching. In BCL such structure is called Dictionary
. So you can use something like this which will have O(N) time complexity
If the value is unique inside the set
var item2Index = itemSet2.ToDictionary(item => item.IdentityValue);
If not
var item2Index = itemSet2.GroupBy(e => e.IdentityValue)
.ToDictionary(g => g.Key, g => g.First());
and then
foreach (var itemFromSet1 in itemSet1)
{
//does a corresponding item exist in itemSet2?
Foo itemSet2Item;
if (!item2Index.TryGetValue(itemFromSet1.IdentityValue, out itemSet2Item))
{
//do stuff to create item in the persistent store
}
else
{
//do stuff to update item in the persistent store
}
}
If you want just to check for a duplicate item in the second set, but don't actually need the duplicate item, then you can use a simple HashSet
(another BCL data structure for fast lookup)
var item2Keys = new HashSet<string>(itemSet2.Select(e => e.IdentityValue));
foreach (var itemFromSet1 in itemSet1)
{
//does a corresponding item exist in itemSet2?
if (!item2Keys.Contains(itemFromSet1.IdentityValue))
{
//do stuff to create item in the persistent store
}
else
{
//do stuff to update item in the persistent store
}
}
回答3:
You can create Dictionary<TKey, TValue>
first and then use its fast lookups it provides almost O(1)
:
List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
Dictionary<string, Foo> itemSet2 = GenerateLargeItemSet().ToDictionary(i => i.IdentityValue);
//O(N)
foreach (var itemFromSet1 in itemSet1)
{
//O(1)
if (!itemSet2.ContainsKey(itemFromSet1.IdentityValue))
{
//do stuff to create item in the persistent store
}
else
{
//do stuff to update item in the persistent store
}
}
回答4:
You may find a performance improvement with the use of a HashSet.
List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
HashSet<Foo> itemSet2 = new HashSet<Foo>(GenerateLargeItemSet());
foreach (var itemFromSet1 in itemSet1)
{
//does a corresponding item exist in itemSet2?
if (itemSet2.Contains(itemFromSet1))
{
//do stuff to update item in the persistent store
}
//do stuff to create item in the persistent store
}
来源:https://stackoverflow.com/questions/34437425/alternatives-to-nested-foreach-when-comparing-list-contents-based-on-some-not-a