问题
I'm wondering what the most efficient way of doing this is.
I have points that I gather from 2 places.
I am only interested in the points which are common to both places.
My plan is to have 3 std::set<Point>
. First I will add in the points from area A,into set A then points from B into set B and let set C be the intersection of both sets.
However, I'm wondering if there is a better way of doing this that involves maybe less sets?
Thanks
回答1:
Your problem is so common that there even is a (named in an obvious way) standard algorithm set_intersection() for you to use.
回答2:
You can do away with set B: first gather points from A into set A, then gather points from B, but place them into C only if they are also present in A.
If sets A and B are of different (and predictable) sizes, the obvious choice would be to eliminate the larger one.
回答3:
The naive set-based approach (built two sets, and then generate the intersection), has the following steps:
- build
std::set A
from source 1. Say source 1 has N points, this is:- O(N log N) time, O(N) space
- build
std::set B
from source 2. Say this has M points, giving:- O(M log M) time, O(M) space
std::set_intersection
- O(M+N) time and space
The slightly improved set-based approach is:
- build
std::set
A from the first source (same complexity as above) for every point in the second source, add it to the result if (and only if) it's in set A
- O(M log N) time, linear space
- so, you avoided O(M) extra space allocated, and the extra O(M+N) intersection step.
You'd implement this using
std::set
like so:if (a.find(point) != a.end()) result.insert(point);
Note that if you know which point source is going to provide fewer points, you should use that source to build set A for best performance. If your sources provide points in sorted order, you can avoid the sets entirely and save even more space & time.
回答4:
You can have a map<Point, int>
where each point is mapped to the number of times it appears, i.e. 1 or 2. Then delete all points that appeared only once.
回答5:
You can use map<Point,int>
to store the points occuring 1 or 2 times. You will also have to overload the <
operator in that case.
回答6:
I think approach with three sets and std::set_intersection is the most logical and efficient. Although if you really want to have less sets you can just iterate through set B and remove points which are not in A then iterate through set A and remove points which are not in B. This should do.
回答7:
If you define a lexicographical operator<
on points, i.e. (extrapolate for >2 dimensions if needed)
bool operator<(Point const &a, Point const &b)
{
if (a.x != b.x) return a.x < b.x;
return a.y < b.y;
}
Then it's possible that std::set
will have good enough performance to compute the intersection efficiently. Assuming your two groups are already in arrays, or generated on the fly, you need 2 sets:
A = { all points from A }
C = { }
for all points in B:
if point is in A:
add point to C
If you're using floating-point coordinates, then you probably need to allow for floating-point errors and search for all points in A
near each point in B
. Then you want a spatial partition structure such as a k-d tree or quadtree/octree,. That would involve bringing in a new library or coding up a fairly complex data structure, so I would avoid it if possible.
来源:https://stackoverflow.com/questions/20979892/efficient-intersection-of-sets