I have the following tables and am trying to look up county codes for a list of several hundred thousand cities.
create table counties (
zip_code_from cha
Months later, this has cropped its head again, and I decided to test some of my theories.
The original query:
select
ci.city, ci.zip_code, co.fips_code
from
cities ci
join counties co on
ci.zip_code between co.from_zip_code and co.thru_zip_code
Does in fact implement a cartesian. The query returns 34,000 rows and takes 597 seconds.
If I "pre-explode" the zip code ranges into discrete records:
with exploded_zip as (
select
generate_series (
cast (from_zip_code as int),
cast (thru_zip_code as int)
)::text as zip_code,
*
from counties
)
select
ci.city, ci.zip_code, co.fips_code
from
cities ci
join exploded_zip co on
ci.zip_code = co.zip_code
The query returns the exact same rows but finishes in 2.8 seconds.
So it seems the bottom line is that using a between
in a join (or any inequality) is a really bad idea.