I have N rectangles with sides parallel to the x- and y-axes. There is another rectangle, model. I need to create an algorithm that can tell whether the
You're on the right track with the sweep line. Conceptually, we want to detect when intersection of the model with the sweep line is not covered by the other rectangles. The high-level template is to break each rectangle into a "left edge" and a "right edge" event, sort the events by x coordinate (putting lefts before rights if the rectangles are closed and rights before lefts if they are open), and then process each event in O(log n) time. This is basically homework, so I will say no more.
Okay, now it seems I can't even sleep at night because I think about this problem... but it also seems I finally got an O(n log n) solution, in average case (but with reduced chances of having a pathological input compared to @lVlad
).
We all know the Divide and Conquer technic. To ensure O(n log n)
using it, we usually focus on 2 points:
O(n)
With these constraints in mind I have elaborated the following algorithm, which is reminiscent of qsort
, and thus suffer the same pitfalls (namely, fractal inputs).
Algorithm
red
that intersect with blue
, they are inserted in a HashSet to remove duplicates --> O(n)
O(n)
red
in the partitions, applying the Clipping technic, note that a given red
might end up giving several chunks in different partitionsThe Pivot Choice is the corner stone of the algorithm, if the partition is ill-tailored we cannot achieve the required complexity. Also, it must be accomplished in O(n)
. I have 2 proposals so far:
Maximum Area
: use the rectangle with the greatest area, because it means that the partitions will have the smallest area afterward, however it suffers from being easy to trumpMedian of 3
: based on qsort, pick up 3 elements, selection the median (the one with the center closer to the barycenter of the 3 centers)I propose to mix them up thusly:
Another aspect of implementation is the Tail of the recursion. Like qsort
it's probable that the algorithm is inefficient for small n
. Instead of going all the way to 1, I propose to use the introsort
trick: if n
is smaller than say 12, then use the following algorithm instead:
red
on the axis (only the edges) and sort them (ala introsort)Dimension agnostic
The algorithm is formally defined to be applicable in any given dimension with the same asymptotic complexity, in average O(n log n). This gives us the opportunity to test in dimension 1 to identify the pathological inputs.
Pathological input
Like qsort
on which it is modelled it is sensible to factorial inputs. By factorial inputs I mean:
1.......6...9.11.13
whenever you pick the average of your interval, you have all the elements on one side of it.
With such an input even choosing the median of 3 is unlikely to yield a very good cut.
EDIT:
I am going to show the partition idea with a little scheme, as @lVlad
noted it was kind of fuzzy.
+----------------+----+---------+
| 1 | 2 | 3 |
+----------------+----+---------+
| 8 | P | 4 |
+----------------+----+---------+
| 7 | 6 | 5 |
+----------------+----+---------+
Okay, so the rectangle we check for coverage is now splitted into 9 subrectangles. We ignore P, it's our pivot.
Now, we would really like that each subrectangle is covered by less red
than N
. The pivot is chosen as a barycenter of the centers, thus it means if our "random" choice held true that there are about as many red
s centers in each direction of the pivot.
It's kind of fuzzy there because some special configurations might make it so that there is little gain at one step (all rectangles have the same center and we just picked the smaller one for example), but it will create chaos and thus the following step will be different.
I am happy if anyone can formalize that, I am an engineer, not a computer scientist, and my maths lag behind...
Hard to know what you are looking for but it sounds to me like an R-tree might work?
There is a trivial O(N^2)
approach that is similar to the "raster" approach that is brought up. Since all the rectangles are axis-parallel, there can only be at most 2N
distinct x and y dimension. Sort all the x's and y's and remap: x_i -> i
. So now you have a 2N x 2N
matrix where you can easily use the naive O(N^2)
algorithm.
OK, I've asked enough questions, here's something of an answer ...
If the data is represented as a raster one algorithm is trivial:
If the data is vector it's a little more complicated. First define a function which returns the rectangle representing the intersection (if any) of two rectangles. This is simple. Then proceed:
Again, only bother with the Red rectangles which intersect the Blue one. For each Red rectangle, compute the intersection of the rectangle with the UnCoveredRectangle. The intersection will result in one of the following situations:
2.1 The intersection equals the UnCoveredRectangle. The job is finished.
2.2 The intersection 'bites' a rectangular chunk out of the CoveredRectangle. The remaining UnCoveredRectangle will be either a rectangle, an L-shaped piece, or a U-shaped piece. If it is a rectangle itself, set UnCoveredRectangle to be the remaining rectangle, and go to step 2. If the UnCoveredRectangle is L- or U-shaped, split it into 2, or 3, rectangles and recurse from step 2..
If you run out of Red rectangles before the area of (part of) UnCoveredRectangle is sent to 0, you've finished.
OK I haven't got a clue about the complexity of this algorithm, but unless the number of rectangles is huge, I'm not too concerned, though perhaps @den is. And I've omitted a lot of details. And I can't draw nice diagrams like @den did, so you'll have to picture it for yourselves.
Here's an O(n lg n) runtime approach using some memory.
Using the example:
We're only interested in the subpart of the scene that contains the 'model' rectangle; in this example, the 'model' rectangle is 1,1 -> 6,6
1 2 3 4 5 6 1 +---+---+ | | 2 + A +---+---+ | | B | 3 + + +---+---+ | | | | | 4 +---+---+---+---+ + | | 5 + C + | | 6 +---+---+
1) collect all the x coordinates that are within the bounds of the model rectangle (both left and right) into a list, then sort it and remove duplicates
1 3 4 5 6
2) collect all the y coordinates that are within the bounds of the model rectangle (both top and bottom) into a list, then sort it and remove duplicates
1 2 3 4 6
3) create a 2D array by number of gaps between the unique x coordinates * number of gaps between the unique y coordinates. This can use a single bit per cell, and you can consider using say C++ STL's bit_vector() for an efficient representation.
4 * 4
4) paint all the rectangles into this grid, painting cell it occurs over:
1 3 4 5 6 1 +---+ | 1 | 0 0 0 2 +---+---+---+ | 1 | 1 | 1 | 0 3 +---+---+---+---+ | 1 | 1 | 2 | 1 | 4 +---+---+---+---+ 0 0 | 1 | 1 | 6 +---+---+
5) Should any cells remain unpainted, you know your model is not completely occluded (or whatever you are testing).
The gathering coordinates and the painting steps are O(n), and the sorting of the coordinates is O(n lg n).
This is adapted from one of my answers to: What is an Efficient algorithm to find Area of Overlapping Rectangles