Is NATURAL (JOIN) considered harmful in production environment?

徘徊边缘 提交于 2019-11-27 15:36:43

NATURAL JOIN syntax is anti-pattern:

  • The purpose of the query is less obvious;
    • the columns used by the application is not clear
    • the columns used can change "unexpectedly"
  • The syntax goes against the modularity rule, about using strict typing whenever possible. Explicit is almost universally better.

Because of this, I don't recommend the syntax in any environment.
I also don't recommend mixing syntax (IE: using both NATURAL JOIN and explicit INNER/OUTER JOIN syntax) - keep a consistent codebase format.

These "traps", which seem to argue against natural joins, cut both ways. Suppose you add a new column to table A, fully expecting it to be used in joining with table B. If you know that every join of A and B is a natural join, then you're done. If every join explicitly uses USING, then you have to track them all down and change them. Miss one and there's a bug.

Use NATURAL joins when the semantics of the tables suggests that this is the right thing to do. Use explicit join criteria when you want to make sure the join is done in a specific way, regardless of how the table definitions might evolve.

One thing that completely destroys NATURAL for me is that most of my tables have an id column, which are obviously semantically all different. You could argue that having a user_id makes more sense than id, but then you end up writing things like user.user_id, a violation of DRY. Also, by the same logic, you would also have columns like user_first_name, user_last_name, user_age... (which also kind of makes sense in view that it would be different from, for example, session_age)... The horror.

I'll stick to my JOIN ... ON ..., thankyouverymuch. :)

I agree with the other posters that an explicit join should be used for reasons of clarity and also to easily allow a switch to an "OUTER" join should your requirements change.

However most of your "traps" have nothing to do with joins but rather the evils of using "SELECT *" instead of explicitly naming the columns you require "SELECT a.col1, a.col2, b.col1, b.col2". These traps occurs whenever a wildcard column list is used.

Adding an extra reason not listed in any of the answers above. In postgres (not sure if this the case for other databases) if no column names are found in common between the two tables when using NATURAL JOIN then a CROSS JOIN is performed. This means that if you had an existing query and then you were to subsequently change one of the column names in a table, you would still get a set of rows returned from the query rather than an error. If instead you used the JOIN ... USING(...) syntax you would get an error if the joining column was no longer there.

The postgres documentation has a note to this effect:

Note: USING is reasonably safe from column changes in the joined relations since only the listed columns are combined. NATURAL is considerably more risky since any schema changes to either relation that cause a new matching column name to be present will cause the join to combine that new column as well.

Daniel Williams

Do you mean the syntax like this:

SELECT * 
  FROM t1, t2, t3 ON t1.id = t2.id 
                 AND t2.id = t3.id

Versus this:

         SELECT *  
           FROM t1 
LEFT OUTER JOIN t2 ON t1.id = t2.id 
                  AND t2.id = t3.id

I prefer the 2nd syntax and also format it differently:

         SELECT *
           FROM T1
LEFT OUTER JOIN T2 ON T2.id = T1.id
LEFT OUTER JOIN T3 ON T3.id = T2.id

In this case, it is very clear what tables I am joining and what ON clause I am using to join them. By using that first syntax is just too easy to not put in the proper JOIN and get a huge result set. I do this because I am prone to typos, and this is my insurance against that. Plus, it is visually easier to debug.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!