Unique assignment of closest points between two tables

随声附和 提交于 2019-12-05 12:22:04
Erwin Brandstetter

Table schema

To enforce your rule simply declare pvanlagen.buildid UNIQUE:

ALTER TABLE pvanlagen ADD CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid);

building.gid is the PK, as your update revealed. To also enforce referential integrity add a FOREIGN KEY constraint to buildings.gid.

You have implemented both by now. But it would be more efficient to run the big UPDATE below before you add these constraints.

There is a lot more that should be improved in your table definition. For one, buildings.gid as well as pvanlagen.buildid should be type integer (or possibly bigint if you burn a lot of PK values). numeric is expensive nonsense.

Let's focus on the core problem:

Basic Query to find closest building

The case is not as simple as it may seem. It's a "nearest neighbour" problem, with the additional complication of unique assignment.

This query finds the nearest one building for each PV (short for PV Anlage - row in pvanlagen), where neither is assigned, yet:

SELECT pv_gid, b_gid, dist
FROM  (
   SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467
   FROM   pvanlagen
   WHERE  buildid IS NULL  -- not assigned yet
   ) p
     , LATERAL (
   SELECT b.gid AS b_gid
        , round(ST_Distance(p.geom31467
                      , ST_Transform(b.centroid, 31467))::numeric, 2) AS dist  -- see below
   FROM   buildings b
   LEFT   JOIN pvanlagen p1 ON p1.buildid = b.gid  -- also not assigned ...
   WHERE  p1.buildid IS NULL                       -- ... yet  
   -- AND    p.gemname = b.gemname                 -- not needed for performance, see below
   ORDER  BY p.geom31467 <-> ST_Transform(b.centroid, 31467)
   LIMIT  1
   ) b;

To make this query fast, you need a spatial, functional GiST index on buildings to make it much faster:

CREATE INDEX build_centroid_gix ON buildings USING gist (ST_Transform(centroid, 31467));

Not sure why you don't

Related answers with more explanation:

Further reading:

With the index in place, we don't need to restrict matches to the same gemname for performance. Only do this if it's an actual rule to enforced. If it has to be observed at all times, include the column in the FK constraint:

Remaining Problem

We can use the above query it in an UPDATE statement. Each PV is only used once, but more than one PV might still find the same building to be closest. You only allow one PV per building. So how would you resolve that?

In other words, how would you assign objects here?

Simple solution

One simple solution would be:

UPDATE pvanlagen p1
SET    buildid = sub.b_gid
     , dist    = sub.dist  -- actual distance
FROM  (
   SELECT DISTINCT ON (b_gid)
          pv_gid, b_gid, dist
   FROM  (
      SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467
      FROM   pvanlagen
      WHERE  buildid IS NULL  -- not assigned yet
      ) p
        , LATERAL (
      SELECT b.gid AS b_gid
           , round(ST_Distance(p.geom31467
                         , ST_Transform(b.centroid, 31467))::numeric, 2) AS dist  -- see below
      FROM   buildings      b
      LEFT   JOIN pvanlagen p1 ON p1.buildid = b.gid  -- also not assigned ...
      WHERE  p1.buildid IS NULL                       -- ... yet  
      -- AND    p.gemname = b.gemname                 -- not needed for performance, see below
      ORDER  BY p.geom31467 <-> ST_Transform(b.centroid, 31467)
      LIMIT  1
      ) b
   ORDER  BY b_gid, dist, pv_gid  -- tie breaker
   ) sub
WHERE   p1.gid = sub.pv_gid;

I use DISTINCT ON (b_gid) to reduce to exactly one row per building, picking the PV with the shortest distance. Details:

For any building that is closest for more one PV, only the closest PV is assigned. The PK column gid (alias pv_gid) serves as tiebreaker if two are equally close. In such a case, some PV are dropped from the update and remain unassigned. Repeat the query until all PV are assigned.

This is still a simplistic algorithm, though. Looking at my diagram above, this assigns building 4 to PV 4 and building 5 to PV 5, while 4-5 and 5-4 would probably be a better solution overall ...

Aside: type for dist column

Currently you use numeric for it. your original query assigned a constant integer, no point making in numeric.

In my new query ST_Distance() returns the actual distance in meters as double precision. If we simply assign that we get 15 or so fractional digits in the numeric data type, and the number is not that exact to begin with. I seriously doubt you want to waste the storage.

I would rather save the original double precision from the calculation. or, better yet, round as needed. If meters are exact enough, just cast to and save an integer (rounding the number automatically). Or multiply with 100 first to save cm:

(ST_Distance(...) * 100)::int
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!