I have N rectangles with sides parallel to the x- and y-axes. There is another rectangle, model. I need to create an algorithm that can tell whether the
Here's how to make a sweepline work in O(n lg n). I will focus on the tricky part of how the BST works.
Keep a balanced BST that contains the start and end of each rectangle that intersects the current sweepline. Each node of the BST contains two auxiliary fields: minOverlap and deltaOverlap. The field minOverlap generally stores the minimum number of rectangles overlapping any point in the interval covered by the subtree of that node. However, for some nodes the value is slightly off. We maintain an invariant that minOverlap plus the sum of deltaOverlap for every node up to the root has the true minimum number of rectangles overlapping a region in the subtree of the node.
When we insert a rectangle-starting node, we always insert at a leaf (and possibly rebalance). As we traverse down the insertion path we "push down" any non-zero deltaOverlap values to the children of the access path of the inserted node, updating the minOverlap of the nodes on the access path. Then, we need to increment every node to the 'right' of the inserted node in the tree in O(lg n) time. This is accomplished by incrementing the minOverlap field of all the right ancestors of the inserted node and incrementing the deltaOverlap of all the right children of the right ancestors of the inserted node. An analogous process is carried out for the insertion of the node that ends the rectangle, as well as the deletion of points. A rebalancing operation can be performed by modifying only the fields of the nodes involved in the rotation. All you have to do is check the root at each point in the sweep to see if the minOverlap is 0.
I've left out details of handling things like duplicate coordinates (one simple solution is just to order the open-rectangle nodes before any close-rectangle nodes of the same coordinate), but hopefully it gives you the idea, and is reasonably convincing.
Here's a generic algorithm
Now the question is how to do the above efficiently. The above can be done in a single loop over all polygons, so I think you are looking at O(n) time.
If you need to create efficient algorithm that will test multiple models, or if you must optimize for fastest answer possible (at the expense of preparing the data) then you are looking for a structure that will allow quick answer to question if a rectangle intersects (or contains) a rectangle.
If you use, for example kd-trees, I believe that this can be answered in O(log n) time - but the important variable in this algorithm is also the number of found rectangles k. You will end up with something like O(k + log n) and you will also need to do the union part to test if the model is covered.
My guess is that the union could be computed in O(k log k), so if k is small then you are looking at O(log n) and if k is comparable to n then it should be O(k log k).
See also this question.
EDIT: In response to complexity of intersections and unions.
In more details, we have two scenarios depending on if k << n or k comparable to n
a) in case of k << n and assuming polynomial complexity for intersection/union (so here I give up the guess O(k log k) ) we have:
The total is O(k + log n + f(k)), which is directly equal to O(log n) since big O depends only on the fastest growing term.
In this case the more significant part of the algorithm is finding the polygons.
b) in the case of k comparable to n we must take a look at the complexity of intersection and union algorithms
(notice the parallel here on how the RDBMs, depending on selectivity, might use index or do table scan; it is a similar choice to what we have here).
If k is big enough the algorithm becomes less of an algorithm for finding rectangles that intersect with the rectangle and becomes more of an algorithm for calculating the union of polygons.
But, i believe that the kd tree can also help in finding the intersection of segments (which are necessary to build the union), so even this part of algorithm might not need k^2 time. At this point I would investigate efficient algorithms for calculating the area of unions.
Here's a way to do this without using rasterization, that is, using only pure numbers.
Note: This is not O(n log n), more like O(n^2). It is, however, a solution. Whether it is an answer, probably not if O(n log n) is a requirement.
The output should thus be:
Let me illustrate the process so far
+-------------------+
|A |
| +----------+-----+
| |C | |
| +-----+----+ | |
| |B | | | |
| | +----+-----+-----+
| | | |
+--+----------+-----+
| |
+----------+
^ ^ ^ ^ ^ ^
1 2 3 4 5 6 <-- X-coordinates
Associated rectangles:
You now create an empty list, L=[]
, and start processing the coordinates and associated rectangles:
List is empty, nothing to process Nothing to remove Add A: L=[ A ]
List contains rectangles, process list as rectangles that have a left edge of X=1, and a right edge of X=2 (the two coordinates we've processed so far), and use their original top and bottom coordinates. Nothing to remove. Add B: L=[ A, B ]
List contains rectangles, process list (both A and B) the same way, treat them as temporarily having left and right coordinates as X=2 and X=3, and use their original top and bottom coordinates. Nothing to remove Add C: L=[ A, B, C ]
Process the three rectangles the same way as above, temporary left and right coordinates are X=3 and X=4 Remove B: L=[A, C ] Nothing to add
Process these in the exact same manner.
This means you will end up with "strips" of rectangles, like this (I've pulled them a bit apart to clearer illustrate that they are strips, but they are located side-by-side continously like in the original diagram):
+--+ +-----+ +----+ ------+
|A | | | | | | |
| | | | +----+ +-----+ +-----+
| | | | |C | | | | |
| | +-----| +----+ | | | |
| | |B | | | | | | |
| | | | +----+ +-----| +-----+
| | | | | | | |
+--+ +-----+ +----+ +-----+
| | | |
+-----+ +----+
^ ^ ^ ^ ^ ^ ^ ^ ^ ^
1 2 2 3 3 4 4 5 5 6
Ok, so now you have your output, a collection of coordinate-pairs, each pair having an associated list of rectangles.
Now we do a trick. We process the vertical strip in the exact same manner, only this time we use the Y coordinates as the delimiters. Let's handle the strip between 3 and 4 above. Remember that the strip has a left and right coordinates of 3 and 4.
Strip looks like this:
^ +----+ <-- 1
A | |
| ^ +----+ <-- 2
| C | |
| ^ | +----+ <-- 3
| B | | |
| | V +----+ <-- 4
| | | |
V | +----+ <-- 5
| | |
V +----+ <-- 6
List of coordinates with rectangles:
New empty list L=[]
Process the coordinates:
Nothing to process (L=[]) Add A to list, L=[ A ]
Process A with temporarily having a top and bottom coordinates of Y=1 and 2 (and remember that it also has a temporary left and right coordinates of X=3 and 4 Add C, L=[ A, C ]
Process A and C, both now having temporary coordinates of (3, 2)-(4, 3) Add B, L=[ A, B, C ]
Process A, B and C, all having coordinates of (3, 3)-(4, 4) Remove C, L=[ A, B ]
Process A and B, coordinates (3, 4)-(4, 5) Remove A, L=[ B ]
Process B, coordinates (3, 5)-(4, 6)
Final output:
pairs of Y-coordinates, with rectangles associated with them (that also have temporarily got new X-coordinates):
Now, suppose you want to ask the question: Is B fully covered by all any combination of the other rectangles.
The answer can be worked out as follows:
In the above example, we see that the 3rd and 4rd rectangle in the final list contains B, but also contains other rectangles, hence those parts of B is covered, but the final rectangle in the list also contains B, but no other rectangle, hence this part is not covered.
According to the original diagram, the final result would include rectangles as follows (the letters inside each denote which original rectangle is associated with this new rectangle):
+--+-----+----+-----+
|A |A |A |A |
| | +----+-----+-----+
| | |AC |AC |C |
| +-----+----+ | |
| |AB |ABC | | |
| | +----+-----+-----+
| | |AB |A |
+--+-----+----+-----+
|B |B |
+-----+----+
If we now take a new look at the original diagram, I have shaded out the parts that the above algorithm would find contains B, but no other rectangle:
+-------------------+
|A |
| +----------+-----+
| |C | |
| +-----+----+ | |
| |B | | | |
| | +----+-----+-----+
| | | |
+--+-----+----+-----+
|#####|####|
+-----+----+
The vertical bar in the middle there is to illustrate that the part would be returned as two rectangles, split at that location, due to the way the vertical strips were worked out.
I seriously hope I made myself understood here. I have some code that can help you with the implementation of each run through the lists of coordinates, but it's 01:21 past midnight here and I'm going to bed, but leave a comment if you wish to see some actual code for this.
Or it would be a great exercise to implement it yourself :)
Here's the link to the class containing the method in question: RangeExtensions.cs.
The method is the Slice
method, just search the page for it. The type in question, Range, is basically a range from one value to another, so there's a bit of data construction and maintenance in addition to the above algorithm.
Here's a divide and conquer algorithm. AVERAGE case complexity is very good and I'd say it's something like O(n log MaxCoords)
. Worst case could be quadratic though, however I think such a test would be pretty difficult to create. To make it even harder, make the order of the recursive function calls random. Maybe what @Larry suggested can get this to O(n log n)
on average.
I can't figure out a sweep line solution, but for the tests I've tried this is very fast.
Basically, use a recursive function the works on the blue rectangle. First check if the blue rectangle is completely covered by one of the other rectangles. If yes, we're done. If not, split it into 4 quadrants, and recursively call the function on those quadrants. All 4 recursive calls must return true. I'm including some C# code that draws the rectangles. You can change it to work with larger values as well, but do remove the drawing procedures in that case, as those will take forever. I've tests it with a million rectangles and a square of side one billion generated such that it isn't covered, and the provided answer (false
) took about a second on a quadcore.
I've tested this on random data mostly, but it seems correct. If it turns out it isn't I'll just delete this, but maybe it'll get you on the right path.
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
List<Rectangle> Rects = new List<Rectangle>();
private const int maxRects = 20;
private void InitRects()
{
Random rand = new Random();
for (int i = 0; i < maxRects; ++i) // Rects[0] is the model
{
int x = rand.Next(panel1.Width);
int y = rand.Next(panel1.Height);
Rects.Add(new Rectangle(new Point(x, y), new Size(rand.Next(panel1.Width - x), rand.Next(panel1.Height - y))));
}
}
private void DrawRects(Graphics g)
{
g.DrawRectangle(Pens.Blue, Rects[0]);
for (int i = 1; i < Rects.Count; ++i)
{
g.DrawRectangle(Pens.Red, Rects[i]);
}
}
private bool Solve(Rectangle R)
{
// if there is a rectangle containing R
for (int i = 1; i < Rects.Count; ++i)
{
if (Rects[i].Contains(R))
{
return true;
}
}
if (R.Width <= 3 && R.Height <= 3)
{
return false;
}
Rectangle UpperLeft = new Rectangle(new Point(R.X, R.Y), new Size(R.Width / 2, R.Height / 2));
Rectangle UpperRight = new Rectangle(new Point(R.X + R.Width / 2 + 1, R.Y), new Size(R.Width / 2, R.Height / 2));
Rectangle LowerLeft = new Rectangle(new Point(R.X, R.Y + R.Height / 2 + 1), new Size(R.Width / 2, R.Height / 2));
Rectangle LowerRight = new Rectangle(new Point(R.X + R.Width / 2 + 1, R.Y + + R.Height / 2 + 1), new Size(R.Width / 2, R.Height / 2));
return Solve(UpperLeft) && Solve(UpperRight) && Solve(LowerLeft) && Solve(LowerRight);
}
private void Go_Click(object sender, EventArgs e)
{
Graphics g = panel1.CreateGraphics();
panel1.Hide();
panel1.Show();
Rects.Clear();
InitRects();
DrawRects(g);
textBox1.Text = Solve(Rects[0]).ToString();
}
I've been thinking about it and I think I finally understood what @algorithmist
meant by sweep line. However the very presence of sorting
operations means that I have:
O(n log n)
in averageO(n**2)
in the worst caseSweep Line
First of all, we need some sorting, because our sweep line
will consist of walking an ordered set.
This ordered set will feature the top
and bottom
line of each of the red
s, as long as they are between the top
and bottom
of blue
. This divides our space into (at most) n*2
horizontal strips.
Now, we need to make sure that in each strip
, the whole of blue
is covered, and this operation cannot have more than O(log n)
complexity, this could be done if we had (for each strip) a list of the covered intervals. This is the 'event' @algorithmist
is speaking of
To handle this event, we'll use a binary tree described below which handles adding or removing a rectangle in exactly O(log n)
operations and yields the rightmost interval covered by the tree, which we use to tell if the strip of blue
is covered or not.
Binary Tree
First of all, I am going to index the red
rectangles. We sort them using this function:
def __lt__(lhs, rhs):
return (lhs.left < rhs.left)
or (lhs.left == rhs.left and lhs.right < rhs.right)
I am going then to create a dedicated binary tree.
N
leaves, each representing a red
rectangle and indicating its presence or absence. They are ordered according to the index.Handling the bug "code block cannot follow list":
class Node:
def __init__(self):
self.interval = []
self.left = None
self.right = None
Now we have two possibilities, let's say the children cover [a,b]
and [c,d]
:
c <= b
, then the node hold [a,d]
[c,d]
Why does it works ? Let's take an example using 4 leaves:
_ [1,9] _
/ \
[1,7] [6,9] <-- Special node merge
/ \ / \
/ \ / \
[1,3] [2,7] [3,5] [6,9]
The special node ignore [3,5]
because it's not the rightmost interval. The reasoning is that the rectangles are ordered:
[6,9]
will cover the missing [5,6]
interval since they begin after 6
[3,5]
begin before 3
, so if they cover the missing [5,6]
they'll cover [3,5]
anywayThe root may not indicate the exact set of intervals covered: only the rightmost interval covered. However, it's perfectly sufficient for us to tell if blue
is completely covered or not!
There are 2 operations available on this tree:
Each is similar:
The recursive bit takes O(log n)
. It's a classic property of the balanced binary trees. And once it's done we have the rightmost interval covered by the root which is sufficient to tell whether or not the blue
segment is entirely covered or not.
Complexity
The complexity of the algorithm is simple:
O(n)
eventsO(log n)
Which yields O(n log n)
for the core part.
However, we should not forget that we also have 2 sort
operations:
Each shall take O(n log n)
in average, but may degenerate into O(n**2)
in the worst case, depending on the sorting algorithm used.
Do you know the usual worst-case O(nlogn)
algorithm for the area of the union of rectangles?
All you need to do here is to compute the two areas:
If these areas are equal, the model is totally covered, otherwise it isn't.