Find all intersecting data, not just the unique values

无人久伴 提交于 2019-12-01 14:16:52

问题


I thought that I understood Intersect, but it turns out I was wrong.

 List<int> list1 = new List<int>() { 1, 2, 3, 2, 3};
 List<int> list2 = new List<int>() { 2, 3, 4, 3, 4};

 list1.Intersect(list2) =>      2,3

 //But what I want is:
 // =>  2,3,2,3,2,3,3

I can figure a way like:

 var intersected = list1.Intersect(list2);
 var list3 = new List<int>();
 list3.AddRange(list1.Where(I => intersected.Contains(I)));
 list3.AddRange(list2.Where(I => intersected.Contains(I)));

Is there a easier way in LINQ to achieve this?

I do need to state that I do not care in which order the results are given.

2,2,2,3,3,3,3 would also be perfectly OK.

Problem is that I am using this on a very large collection, So I need efficiency.

We are talking about Objects, not ints. The ints were just for the easy example, but I realize this can make a difference.


回答1:


Let's see if we can precisely characterize what you want. Correct me if I am wrong. You want: all elements of list 1, in order, that also appear in list 2, followed by all elements of list 2, in order, that also appear in list 1. Yes?

Seems straightforward.

return list1.Where(x=>list2.Contains(x))
     .Concat(list2.Where(y=>list1.Contains(y)))
     .ToList();

Note that this is not efficient for large lists. If the lists have a thousand items each then this does a couple million comparisons. If you're in that situation then you want to use a more efficient data structure for testing membership:

list1set = new HashSet(list1);
list2set = new HashSet(list2);

return list1.Where(x=>list2set.Contains(x))
     .Concat(list2.Where(y=>list1set.Contains(y)))
     .ToList();

which only does a couple thousand comparisons, but potentially uses more memory.




回答2:


var set = new HashSet(list1.Intersect(list2));
return list1.Concat(list2).Where(i=>set.Contains(i));



回答3:


Maybe this could help: https://gist.github.com/mladenb/b76bcbc4063f138289243fb06d099dda

The original Except/Intersect return a collection of unique items, even though their contract doesn't state so (e.g. the return value of those methods isn't a HashSet/Set, but rather IEnumerable), which is probably a result of a poor design decision. Instead, we can use more intuitive implementation, which returns as much of the same elements from the first enumeration as there are, not just a unique one (using Set.Contains).

Further more, mapping function was added in order to help intersect/except collections of different types.

If you don't need to intersect/except collections of different types, just inspect the source code of the Intersect/Except and change the part which iterates through the first enumeration to use Set.Contains instead of Set.Add/Set.Remove.




回答4:


I don't believe this is possible with the built-in APIs. But you could use the following to get the result you're looking for.

IEnumerable<T> Intersect2<T>(this IEnumerable<T> left, IEnumerable<T> right) {
  var map = left.ToDictionary(x => x, y => false);
  foreach ( var item in right ) {
    if (map.ContainsKey(item) ) {
      map[item] = true;
    }
  }
  foreach ( var cur in left.Concat(right) ) {
    if ( map.ContainsKey(cur) ) {
      yield return cur;
    }
  }
}


来源:https://stackoverflow.com/questions/2180054/find-all-intersecting-data-not-just-the-unique-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!