Alternatives to nested foreach when comparing list contents based on some (not all) property values

问题

As part of this question, it was pointed out repeatedly that I had an O(n^2) problem using code similar to this...

public class Foo
{
  public string IdentityValue {get;set;}  
  public string Prop1 {get;set;}
  public string Prop2 {get;set;}

}

List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
List<Foo> itemSet2 = GenerateLargeItemSet();

foreach (var itemFromSet1 in itemSet1)
{

  //does a corresponding item exist in itemSet2?
  var itemSet2Item = itemSet2.FirstOrDefault(i => i.IdentityValue == itemFromSet1.IdentityValue);

  if (itemSet2Item != null)
  { 
    //do stuff to create item in the persistent store
  }
  else
  {
    //do stuff to update item in the persistent store
  }
}

Excusing the string comparison and parallelization considerations, is there a cheap and generic (objects may be type T, and the Identity property may be something else) way to reduce the O(n^2) nature of this?

回答1:

One of solutions is to use Enumerable.Join method which have complextiy O(n)

List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
List<Foo> itemSet2 = GenerateLargeItemSet();

// O(n)
var joinedSet = itemSet1.Join(itemSet2, s1 => s1.IdentityValue, s2 => s2.IdentityValue, (f1, f2) => f1).ToList();

// O(n)
foreach (var joinedItem in joinedSet)
{
    //do stuff to create item in the persistent store
}

// O(n)
var unjoinedSet = itemSet1.Except(joinedSet);

// O(n)
foreach (var unjoinedItem in unjoinedSet)
{
    //do stuff to update item in the persistent store
}

回答2:

A well known way of improving the database queries speed is creating an index. The same principle can be applied here. But what is an index? It's just a data structure that allows fast searching. In BCL such structure is called Dictionary. So you can use something like this which will have O(N) time complexity

If the value is unique inside the set

var item2Index = itemSet2.ToDictionary(item => item.IdentityValue);

If not

var item2Index = itemSet2.GroupBy(e => e.IdentityValue)
    .ToDictionary(g => g.Key, g => g.First());

and then

foreach (var itemFromSet1 in itemSet1)
{
    //does a corresponding item exist in itemSet2?
    Foo itemSet2Item;
    if (!item2Index.TryGetValue(itemFromSet1.IdentityValue, out itemSet2Item))
    { 
        //do stuff to create item in the persistent store
    }
    else
    {
        //do stuff to update item in the persistent store
    }
}

If you want just to check for a duplicate item in the second set, but don't actually need the duplicate item, then you can use a simple HashSet (another BCL data structure for fast lookup)

var item2Keys = new HashSet<string>(itemSet2.Select(e => e.IdentityValue));
foreach (var itemFromSet1 in itemSet1)
{
    //does a corresponding item exist in itemSet2?
    if (!item2Keys.Contains(itemFromSet1.IdentityValue))
    { 
        //do stuff to create item in the persistent store
    }
    else
    {
        //do stuff to update item in the persistent store
    }
}

回答3:

You can create Dictionary<TKey, TValue> first and then use its fast lookups it provides almost O(1) :

 List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
Dictionary<string, Foo> itemSet2 = GenerateLargeItemSet().ToDictionary(i => i.IdentityValue);

//O(N)
foreach (var itemFromSet1 in itemSet1)
{
  //O(1)   
  if (!itemSet2.ContainsKey(itemFromSet1.IdentityValue))
  { 
     //do stuff to create item in the persistent store
  }
  else
  {
    //do stuff to update item in the persistent store
  }
}

回答4:

You may find a performance improvement with the use of a HashSet.

        List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
        HashSet<Foo> itemSet2 = new HashSet<Foo>(GenerateLargeItemSet());

        foreach (var itemFromSet1 in itemSet1)
        {
            //does a corresponding item exist in itemSet2?
            if (itemSet2.Contains(itemFromSet1))
            {
                //do stuff to update item in the persistent store
            }

            //do stuff to create item in the persistent store
        }

来源：https://stackoverflow.com/questions/34437425/alternatives-to-nested-foreach-when-comparing-list-contents-based-on-some-not-a

标签

foreach

big-o