Why is PostgreSQL (9.1) not using index for simple equality select?

问题

My table lead has an index:

\d lead
...
Indexes:
    "lead_pkey" PRIMARY KEY, btree (id)
    "lead_account__c" btree (account__c)
    ...
    "lead_email" btree (email)
    "lead_id_prefix" btree (id text_pattern_ops)

Why doesn't PG (9.1) use the index for this straightforward equality selection? Emails are almost all unique....

db=> explain select * from lead where email = 'blah';
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on lead  (cost=0.00..319599.38 rows=1 width=5108)
   Filter: (email = 'blah'::text)
(2 rows)

Other index-hitting queries seem to be OK (though I don't know why this one doesn't just use the pkey index):

db=> explain select * from lead where id = '';
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using lead_id_prefix on lead  (cost=0.00..8.57 rows=1 width=5108)
   Index Cond: (id = ''::text)
(2 rows)

db=> explain select * from lead where account__c = '';
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using lead_account__c on lead  (cost=0.00..201.05 rows=49 width=5108)
   Index Cond: (account__c = ''::text)
(2 rows)

At first I thought it may be due to not enough distinct values of email. For instance, if the stats claim that email is blah for most of the table, then a seq scan is faster. But that's not the case:

db=> select count(*), count(distinct email) from lead;
 count  | count
--------+--------
 749148 | 733416
(1 row)

Even if I force seq scans to be off, the planner behaves as if it has no other choice:

db=> set enable_seqscan = off;
SET
db=> show enable_seqscan;
 enable_seqscan
----------------
 off
(1 row)

db=> explain select * from lead where email = 'foo@blah.com';
                            QUERY PLAN
---------------------------------------------------------------------------
 Seq Scan on lead  (cost=10000000000.00..10000319599.38 rows=1 width=5108)
   Filter: (email = 'foo@blah.com'::text)
(2 rows)

Also tried EXPLAIN ANALYZE:

db=> explain analyze select * from lead where email = 'foo@blah.com';
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on lead  (cost=10000000000.00..10000319732.76 rows=1 width=5102) (actual time=77845.244..77845.244 rows=0 loops=1)
   Filter: (email = 'foo@blah.com'::text)
 Total runtime: 77857.215 ms
(3 rows)

Here is the \d output (sorry, have to obscure the column names, and cropped to fit in SO's limits; see uncropped version at http://pastebin.com/ve3gzJpY):

                                 Table "lead"
                   Column                   |            Type             | Modifiers 
--------------------------------------------+-----------------------------+-----------
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean                     | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 email                                      | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean                     | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 account__c                                 | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 id                                         | text                        | not null
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real                        | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | timestamp without time zone | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real                        | 
Indexes:
    "lead_pkey" PRIMARY KEY, btree (id)
    "lead_account__c" btree (account__c)
    "lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
    "lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
    "lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
    "lead_email" btree (email)
    "lead_id_prefix" btree (id text_pattern_ops)

Here is pg_dump --schema-only -t lead (again see uncropped at http://pastebin.com/ve3gzJpY, with unique column names as well in case it helps reproducibility):

--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: lead; Type: TABLE; Schema: public; Owner: pod; Tablespace: 
--

CREATE TABLE lead (
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX boolean,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX date,
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    account__c text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    id text NOT NULL,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX timestamp without time zone,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real
);


ALTER TABLE lead OWNER TO pod;

--
-- Name: lead_pkey; Type: CONSTRAINT; Schema: public; Owner: pod; Tablespace: 
--

ALTER TABLE ONLY lead
    ADD CONSTRAINT lead_pkey PRIMARY KEY (id);


--
-- Name: lead_account__c; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_account__c ON lead USING btree (account__c);


--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);


--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);


--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);


--
-- Name: lead_email; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_email ON lead USING btree (email);


--
-- Name: lead_id_prefix; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_id_prefix ON lead USING btree (id text_pattern_ops);


--
-- PostgreSQL database dump complete
--

Some PG catalog incantation:

db=> select * from pg_index where indexrelid = 'lead_email'::regclass;
 indexrelid | indrelid  | indnatts | indisunique | indisprimary | indisexclusion | indimmediate | indisclustered | indisvalid | indcheckxmin | indisready | indkey | indcollation | indclass | indoption | indexprs | indpred
------------+-----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+--------+--------------+----------+-----------+----------+---------
  215251995 | 101034456 |        1 | f           | f            | f              | t            | f              | t          | t            | t          | 101    | 100          | 10043    | 0         | ¤        | ¤
(1 row)

Some locale info:

db=> show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

db=> show lc_ctype;
  lc_ctype   
-------------
 en_US.UTF-8
(1 row)

I searched over a good number of past SO questions but none were about a simple equality query like this one.

回答1:

To troubleshoot these it is imperative that you run VACUUM ANALYSE on the table between troubleshooting steps to see what works. Otherwise you may not know exactly what changed where. So try that and run again first and see if it fixes the problem.

The next steps to run (run vacuum analyse and atest case between each one) are:

ALTER TABLE lead ALTER COLUMN email SET STATISTICS 1000;

Maybe that will fix it. Maybe not.

If that doesn't fix it, take a close look at the pg_stat view:

SELECT * FROM pg_stat WHERE table_name = 'lead';

Please read the following thoroughly and see what you can see that is amiss in pg_stat;

http://www.postgresql.org/docs/9.0/static/planner-stats.html

EDIT: To be very clear, vacuum analyse is not the whole of the troubleshooting. However it MUST be run in between troubleshooting steps because otherwise you can't be sure the planner is taking into account correct data.

回答2:

CREATE INDEX lead_id_prefix ON lead USING btree (id text_pattern_ops);

The use of text_pattern_ops seems odd here. If your ID is some kind of integer, I'd try dropping this index as a test. (I wouldn't hesitate to drop this index on a development server.) Since you have another btree index on "lead.id", I'd expect dropping this index to cajole the optimizer into using the other index on "lead.id".

If that proves to be true, then I'll try to dig deeper into the causes.

来源：https://stackoverflow.com/questions/15965785/why-is-postgresql-9-1-not-using-index-for-simple-equality-select

标签

postgresql

indexing

postgresql-9.1