i want to optimize this query,
select location_id, dept_id,
round(sum(sales),0), sum(qty),
count(distinct tran_id),
now()
from tran_sales
where tran_date <= '2016-12-24'
group by location_id, dept_id;
currently this query runs for around average of 98 seconds (Query took 97.4096 seconds.) in windows 10, 64 Bit OS, 16 GB RAM.
this is the table detail for your reference.
CREATE TABLE tran_sales (
tran_date date NOT NULL,
location_id int(11) NOT NULL,
dept_id int(11) NOT NULL,
item_id varchar(25) NOT NULL,
tran_id int(11) NOT NULL,
sales float DEFAULT NULL,
qty int(11) DEFAULT NULL,
update_datetime datetime NOT NULL,
PRIMARY KEY (tran_date,location_id,dept_id,item_id,tran_id),
KEY tran_date (tran_date)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
the record count in table tran_sales: 13.5 Million.
Note: Even i tried without and with this index KEY tran_date (tran_date)
. and average time it takes is 98 seconds with and without KEY tran_date (tran_date)
please suggest how to speedup the results by either changing the query or by changing some default settings of my.ini if that helps. Thanks.
Update the min date in table is : 2016-07-01, and the max date in table is : 2017-07-25
None of the suggestions so far will help much, because...
- Covering index: That is only slightly smaller than the table, so it is slightly faster.
KEY(tran_date)
-- a waste; it is better to use the PK, which starts withtran_date
.PARTITIONing
-- No. That is likely to be slower.- Removing
tran_date
(or otherwise rearranging the PK) -- This will hurt. The filtering (WHERE
) is ontran_date
; it is usually best to have that first. - So, why was
COUNT(*)
fast? Well, start by looking at theEXPLAIN
. It will show that it usedKEY(tran_date)
instead of scanning the table. Less data to scan, hence faster.
The real issue is that you have millions of rows to scan, it takes time to touch millions of rows.
How to speed it up? Create and maintain a Summary table . Then query that table (with thousands of rows) instead of the original table (millions of rows). Total count is SUM(counts)
; total sum is SUM(sums)
; average is SUM(sums)/SUM(counts)
, etc.
For this query:
select location_id, dept_id,
round(sum(sales), 0), sum(qty), count(distinct tran_id),
now()
from tran_sales
where tran_date <= '2016-12-24'
group by location_id, dept_id;
There is not much you can do. One attempt would be a covering index: (tran_date, location_id, dept_id, sales, qty)
, but I don't think that will help much.
来源:https://stackoverflow.com/questions/47976837/mysql-optimization-suggestion-for-large-table