How to improve speed of query?

随声附和 提交于 2021-01-29 12:21:04

问题


I have got next table with data:

CREATE TABLE xml_files ( 
    "parsing_status" Character Varying( 150 ),
    "purchaseNumber" Character Varying( 2044 ),
    "docPublishDate" Timestamp With Time Zone );
 ;

data:

purchaseNumber  parsing_status  docPublishDate
0373200554017000226 null    2017-07-28 19:00:10.885+03
0373200554017000226 null    2017-07-28 19:08:30.346+03
0373200554017000226 null    2017-07-28 19:24:35.265+03
0373400005317002182 null    2017-07-28 19:45:02.162+03
0348100035117000082 null    2017-07-28 20:08:26.37+03
0373200554017000292 null    2017-07-28 20:10:24.312+03
0373200081217000531 null    2017-07-28 20:13:56.166+03
0373200041517000400 null    2017-07-28 21:23:20.616+03
0373200081217000531 null    2017-07-29 08:18:29.571+03
0373200081217000531 null    2017-07-29 09:34:11.545+03
0373100026117000078 null    2017-07-29 10:37:01.161+03
0573400000117001086 null    2017-07-29 11:25:37.863+03
0573400000117001096 null    2017-07-29 11:30:36.499+03
0373200081217000531 null    2017-07-29 12:14:04.033+03
0573400000117001118 null    2017-07-29 14:50:34+03
0573400000117001118 null    2017-07-29 16:49:12.457+03
0373100026117000080 null    2017-07-29 16:52:02.013+03
0373100026117000080 null    2017-07-29 17:05:40.981+03
0373100026117000080 null    2017-07-29 17:13:29.532+03
0373200554017000226 null    2017-07-29 18:55:47.488+03

The column purchaseNumber have duplicates. I need to select for processing all latest unparsed records. I am doing it with next SQL:

 SELECT
 "purchaseNumber", "parsing_status", "docPublishDate"
  FROM (
    select distinct on ("purchaseNumber") x.*
    from xml_files x
    order by "purchaseNumber", "docPublishDate" desc
  ) x
  where parsing_status is distinct from 'true'
     AND parsing_status IS NULL 
   order by "docPublishDate" LIMIT 100

The problem that query is take very long time on table with millions for rows. How I can improve speed? Here is data sample: https://www.db-fiddle.com/f/vycMHGLYML5K56SN77HLsY/0


回答1:


For your query, you want an index on xml_files("purchaseNumber", "docPublishDate" desc):

create index idx_xml_files_2 on xml_files("purchaseNumber", "docPublishDate" desc)

Postgres should use this index for the order by, which facilitates the distinct on.

Although it won't affect performance, I would also suggest simplifying the where clause to:

where parsing_status IS NULL 



回答2:


Ok I will give you some tips

1. Improve query

SELECT t1.purchaseNumber, t1.parsing_status, t1.docPublishDate
FROM xml_files t1
LEFT JOIN xml_files t2
  ON t1.purchaseNumber = t2.purchaseNumber
  AND t1.docPublishDate < t2.docPublishDate
WHERE t1.parsing_status IS NULL 
AND t2.parsing_status IS NULL 
AND t2.docPublishDate IS NULL

2. Improve table

You can also try adding some index, but if the table only contains these 3 columns I am not sure how much you can improve. Depending on the data distribution, for example if you know that half of parsing_status are null, you can try:

create index idx_xml_files_2 on xml_files("parsing_status", "purchaseNumber")


来源:https://stackoverflow.com/questions/65493027/how-to-improve-speed-of-query

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!