Slow select distinct query on postgres

后端未结

关注

 4  1443

自闭症患者 2020-12-30 09:55

I\'m doing the following two queries quite frequently on a table that essentially gathers up logging information. Both select distinct values from a huge number of rows but

4条回答

温柔的废话 (楼主)

2020-12-30 10:34

I have the same problem with tables > 300 millions records and an indexed field with a few distinct values. I couldn't get rid of the seq scan so I made this function to simulate a distinct search using the index if it exists. If your table has a number of distinct values proportional to the total number of records, this function isn't good. It also has to be adjusted for multi-columns distinct values. Warning: This function is wide open to sql injection and should only be used in a securized environment.

Explain analyze results:
Query with normal SELECT DISTINCT: Total runtime: 598310.705 ms
Query with SELECT small_distinct(...): Total runtime: 1.156 ms

CREATE OR REPLACE FUNCTION small_distinct(
   tableName varchar, fieldName varchar, sample anyelement = ''::varchar)
   -- Search a few distinct values in a possibly huge table
   -- Parameters: tableName or query expression, fieldName,
   --             sample: any value to specify result type (defaut is varchar)
   -- Author: T.Husson, 2012-09-17, distribute/use freely
   RETURNS TABLE ( result anyelement ) AS
$BODY$
BEGIN
   EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName
      ||' LIMIT 1'  INTO result;
   WHILE result IS NOT NULL LOOP
      RETURN NEXT;
      EXECUTE 'SELECT '||fieldName||' FROM '||tableName
         ||' WHERE '||fieldName||' > $1 ORDER BY ' || fieldName || ' LIMIT 1'
         INTO result USING result;
   END LOOP;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;

Call samples:

SELECT small_distinct('observations','id_source',1);
SELECT small_distinct('(select * from obs where id_obs > 12345) as temp',
   'date_valid','2000-01-01'::timestamp);
SELECT small_distinct('addresses','state');

0 讨论(0)

查看其它4个回答