Procedurally transform subquery into join

前端 未结 7 1482
有刺的猬
有刺的猬 2020-12-05 14:34

Is there a generalized procedure or algorithm for transforming a SQL subquery into a join, or vice versa? That is, is there a set of typographic operations that can be appli

7条回答
  •  隐瞒了意图╮
    2020-12-05 15:13

    This question relies on a basic knowledge of Relational Algebra. You need to ask yourself what kind of join is being performed. For example, an LEFT ANTI SEMI JOIN is like a WHERE NOT EXISTS clause.

    Some joins do not allow duplicating of data, some do not allow eliminating data. Others allow extra fields to be available. I discuss this in my blog at http://msmvps.com/blogs/robfarley/archive/2008/11/09/join-simplification-in-sql-server.aspx

    Also, please don't feel you need to do everything in JOINs. The Query Optimizer should take care of all of this for you, and you can often make your queries much harder to maintain this way. You may find yourself using an extensive GROUP BY clause, and having interesting WHERE .. IS NULL filters, which will only serve to disconnect the business logic from the query design.

    A subquery in the SELECT clause (essentially a lookup) only provides an extra field, not duplication or elimination. Therefore, you would need to make sure that you enforce GROUP BY or DISTINCT values in your JOIN, and use an OUTER JOIN to guarantee that behaviour is the same.

    A subquery in the WHERE clause can never duplicate data, or provide extra columns to the SELECT clause, so you should use GROUP BY / DISTINCT to check this. WHERE EXISTS is similar. (This the LEFT SEMI JOIN)

    WHERE NOT EXISTS (LEFT ANTI SEMI JOIN) doesn't provide data, and doesn't duplicate rows, but can eliminate... for this you need to do LEFT JOINs and look for NULLs.

    But the Query Optimizer should handle all this for you. I actually like having occasional subqueries in the SELECT clause, because it makes it very clear that I am not duplicating or eliminating rows. The QO can tidy it for me, but if I'm using a view or inline table-valued function, I want to make it clear to those who come after me that the QO can simplify it down a lot. Have a look at the Execution Plans of your original query, and you'll see that the system is providing the INNER/OUTER/SEMI joins for you.

    The thing you really need to be avoiding (at least in SQL Server) are functions that use BEGIN and END (such as Scalar Functions). They may feel like they simplify your code, but they will actually be executed in a separate context, as the system sees them as procedural (not simplifiable).

    I did a session on this kind of thing at the recent SQLBits V conference. It was recorded, so you should be able to watch it at some point (if you can put up with my jokes!)

提交回复
热议问题