问题
Introducing an ORDER BY clause in a query increases the total time due the extra work that the db have to do in order to sort the result set:
- copy the resulting tuples in some temporary memory
- sorting them (hopefully in memory, otherwise using the disk)
- stream the result to the client
What I miss is why just adding a column from a joined table produces a so different performance.
Query1
EXPLAIN ANALYZE
SELECT p.*
FROM product_product p
JOIN django_site d ON (p.site_id = d.id)
WHERE (p.active = true AND p.site_id = 1 )
ORDER BY d.domain, p.ordering, p.name
Query plan
Sort (cost=3909.83..3952.21 rows=16954 width=1086) (actual time=1120.618..1143.922 rows=16946 loops=1)
Sort Key: django_site.domain, product_product.ordering, product_product.name
Sort Method: quicksort Memory: 25517kB
-> Nested Loop (cost=0.00..2718.86 rows=16954 width=1086) (actual time=0.053..87.396 rows=16946 loops=1)
-> Seq Scan on django_site (cost=0.00..1.01 rows=1 width=24) (actual time=0.010..0.012 rows=1 loops=1)
Filter: (id = 1)
-> Seq Scan on product_product (cost=0.00..2548.31 rows=16954 width=1066) (actual time=0.036..44.138 rows=16946 loops=1)
Filter: (product_product.active AND (product_product.site_id = 1))
Total runtime: 1182.515 ms
Query 2
Same as the above but not sorting by django_site.domain
Query plan
Sort (cost=3909.83..3952.21 rows=16954 width=1066) (actual time=257.094..278.905 rows=16946 loops=1)
Sort Key: product_product.ordering, product_product.name
Sort Method: quicksort Memory: 25161kB
-> Nested Loop (cost=0.00..2718.86 rows=16954 width=1066) (actual time=0.075..86.120 rows=16946 loops=1)
-> Seq Scan on django_site (cost=0.00..1.01 rows=1 width=4) (actual time=0.015..0.017 rows=1 loops=1)
Filter: (id = 1)
-> Seq Scan on product_product (cost=0.00..2548.31 rows=16954 width=1066) (actual time=0.052..44.024 rows=16946 loops=1)
Filter: (product_product.active AND (product_product.site_id = 1))
Total runtime: 305.392 ms
This question could be related.
Edit: More details added
Table "public.product_product"
Column | Type |
-------------+------------------------+---------
id | integer | not null default nextval('product_product_id_seq'::regclass)
site_id | integer | not null
name | character varying(255) | not null
slug | character varying(255) | not null
sku | character varying(255) |
ordering | integer | not null
[snip some columns ]
Indexes:
"product_product_pkey" PRIMARY KEY, btree (id)
"product_product_site_id_key" UNIQUE, btree (site_id, sku)
"product_product_site_id_key1" UNIQUE, btree (site_id, slug)
"product_product_site_id" btree (site_id)
"product_product_slug" btree (slug)
"product_product_slug_like" btree (slug varchar_pattern_ops)
Table "public.django_site"
Column | Type |
--------+------------------------+----------
id | integer | not null default nextval('django_site_id_seq'::regclass)
domain | character varying(100) | not null
name | character varying(50) | not null
Indexes:
"django_site_pkey" PRIMARY KEY, btree (id)
The Postgres version is 8.4
some table stats:
# select count(*) from django_site;
count
-------
1
# select count(*) from product_product;
count
-------
17540
# select active, count(*) from product_product group by active;
active | count
--------+-------
f | 591
t | 16949
# select site_id, count(*) from product_product group by site_id;
site_id | count
---------+-------
1 | 17540
回答1:
The output of EXPLAIN ANALYZE is identical up to the sort operation, so sorting makes the difference.
In both queries you return all rows of product_product
, but in the first case you sort by a column of django_site
, so django_site.domain
has to be retrieved in addition, that costs extra. But would not explain the big difference.
There is a good chance that the physical order of the rows in product_product
is already according to the column ordering
, which makes the sort in case 2 very cheap and the sort in case 1 expensive.
After "more details added":
It is also considerably more expensive so sort by character varying(100)
than to sort by an integer
column. In addition to integer being much smaller, there is also the collation support that slows you down. To verify, try ordering with COLLATE "C"
. Read more about collation support in the manual.
If you were running PostgreSQL 9.1. I see now, that you have PostgreSQL 8.4.
Obviously, all rows in the query output have the same value for django_site.domain
as you filter on p.site_id = 1
. If the query planner was smarter, it might skip the first column for ordering to begin with.
You run PostgreSQL 8.4. The query planner of 9.1 has become considerably more intelligent. Upgrading might change the situation, but I can't say for certain.
To verify my theory about physical ordering, you could try and make a copy of your big table with the rows inserted in random order and then run the queries again. Like this:
CREATE TABLE p AS
SELECT *
FROM public.product_product
ORDER BY random();
And then:
EXPLAIN ANALYZE
SELECT p.*
FROM p
JOIN django_site d ON (p.site_id = d.id)
WHERE p.active
AND p.site_id = 1
ORDER BY d.domain, p.ordering, p.name;
Any difference? --> Obviously that doesn't explain it ...
OK, to test whether the varchar(100)
makes the difference, I recreated your scenario. See the separate answer with a detailed test case and benchmark. This answer is overloaded already.
To sum it up:
Turns out, my other explanation fits. The main reason for the slowdown is obviously sorting by a varchar(100)
column according to a locale (LC_COLLATE).
I added some explanation and links to the test case. The results should speak for themselves.
回答2:
Test Case
PostgreSQL 9.1. Test database with limited resources, but way enough for this small case. The locale for collation will be relevant:
SHOW LC_COLLATE;
de_AT.UTF-8
Step 1) Reconstruct raw test environment
-- DROP TABLE x;
CREATE SCHEMA x; -- test schema
-- DROP TABLE x.django_site;
CREATE TABLE x.django_site (
id serial primary key
,domain character varying(100) not null
,int_col int not null
);
INSERT INTO x.django_site values (1,'www.testsite.com/foodir/', 3);
-- DROP TABLE x.product;
CREATE TABLE x.product (
id serial primary key
,site_id integer not null
,name character varying(255) not null
,slug character varying(255) not null
,sku character varying(255)
,ordering integer not null
,active boolean not null
);
INSERT INTO x.product (site_id, name, slug, sku, ordering, active)
SELECT 1
,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
,i -- ordering in sequence
,NOT (random()* 0.5174346569119122)::int::bool
FROM generate_series(1, 17540) AS x(i);
-- SELECT ((591::float8 / 17540)* 0.5) / (1 - (591::float8 / 17540))
-- = 0.5174346569119122
CREATE INDEX product_site_id on x.product(site_id);
Step 2) ANALYZE
ANALYZE x.product;
ANALYZE x.django_site;
Step 3) Reorder BY random()
-- DROP TABLE x.p;
CREATE TABLE x.p AS
SELECT *
FROM x.product
ORDER BY random();
ANALYZE x.p;
Results
EXPLAIN ANALYZE
SELECT p.*
FROM x.p
JOIN x.django_site d ON (p.site_id = d.id)
WHERE p.active
AND p.site_id = 1
-- ORDER BY d.domain, p.ordering, p.name
-- ORDER BY p.ordering, p.name
-- ORDER BY d.id, p.ordering, p.name
-- ORDER BY d.int_col, p.ordering, p.name
-- ORDER BY p.name COLLATE "C"
-- ORDER BY d.domain COLLATE "C", p.ordering, p.name -- dvd's final solution
1) Pre ANALYZE (-> bitmap index scan)
2) Post ANALYZE (-> seq scan)
3) Re-order by random(), ANALYZE
ORDER BY d.domain, p.ordering, p.name
1) Total runtime: 1253.543 ms
2) Total runtime: 1250.351 ms
3) Total runtime: 1283.111 ms
ORDER BY p.ordering, p.name
1) Total runtime: 177.266 ms
2) Total runtime: 174.556 ms
3) Total runtime: 177.797 ms
ORDER BY d.id, p.ordering, p.name
1) Total runtime: 176.628 ms
2) Total runtime: 176.811 ms
3) Total runtime: 178.150 ms
The planner obviously factors in that d.id
is functionally dependent.
ORDER BY d.int_col, p.ordering, p.name -- integer column in other table
1) Total runtime: 242.218 ms -- !!
2) Total runtime: 245.234 ms
3) Total runtime: 254.581 ms
The planner obviously misses that d.int_col
(NOT NULL) is just as functionally dependent. But sorting by an integer column is cheap.
ORDER BY p.name -- varchar(255) in same table
1) Total runtime: 2259.171 ms -- !!
2) Total runtime: 2257.650 ms
3) Total runtime: 2258.282 ms
Sorting by a (long) varchar
or text
column is expensive ...
ORDER BY p.name COLLATE "C"
1) Total runtime: 327.516 ms -- !!
2) Total runtime: 325.103 ms
3) Total runtime: 327.206 ms
... but not as expensive if done without locale.
With the locale out of the way, sorting by a varchar
column is not quite but almost as fast. Locale "C"
is effectively "no locale, just order by byte value". I quote the manual:
If you want the system to behave as if it had no locale support, use the special locale name C, or equivalently POSIX.
Putting it all together, @dvd chose:
ORDER BY d.domain COLLATE "C", p.ordering, p.name
...
3) Total runtime: 275.854 ms
That should do.
回答3:
As far as i can see, you need some indexes
- create index product_product_idx01 on product_product(active, site_id); This probably will speed up your query.
- why do you order by domain, it's meanless your query
来源:https://stackoverflow.com/questions/9888096/slow-query-ordering-by-a-column-in-a-joined-table