How to use unique column values as input into another select statement

最后都变了- 提交于 2019-12-12 02:55:58

问题


I have a table (MySQL) that has a column called binID. The values in this column can range from 1 to 70.

What I want to do is select the unique values of this column (should be the numbers from 1 to 70), then iterate over them using each (lets call it theBinID) as a parameter into another SELECT statement such as:

SELECT * FROM MyTable WHERE binID = theBinID ORDER BY createdDate DESC LIMIT 10

Basically, I am looking to get the 10 most recent rows for each binID.

I do not believe there is a way to do this with a basic SQL statement, although I would love that to be the answer, so I have written a stored procedure that creates a cursor on the SELECT DISTINCT of binIDs, and then iterates over it and populates a temp table.

My problem is, this is for an optimization, and if I fetch 100K rows I get 1.7 second average time. Executing my stored procedure to get 700 rows (10 records for 70 bins) is taking 1.4 seconds. I realize that 0.3 seconds could be perceived as considerable improvement, but I am hoping to get this sub-second for 100K rows.

Is there a better way??

The full stored proc is this:

BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE binID INT;
DECLARE cur1 CURSOR FOR SELECT DISTINCT heatmapBinID from MEStressTest ORDER BY heatmapBinID ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

DROP TEMPORARY TABLE IF EXISTS TempResults;

CREATE TEMPORARY TABLE TempResults (
    `recordID` text NOT NULL,
    `queryTerm` text NOT NULL,
    `recordCreated` double(11,0) NOT NULL,
    `recordByID` text NOT NULL,
    `recordByName` text NOT NULL,
    `recordText` text NOT NULL,
    `recordSource` text NOT NULL,
    `rerecordCount` int(11) NOT NULL DEFAULT '0',
    `timecodeOffset` int(11) NOT NULL DEFAULT '-1',
    `recordByImageURL` text NOT NULL,
    `canDelete` int(11) NOT NULL DEFAULT '1',
    `heatmapBinID` int(11) DEFAULT NULL,
    `timelineBinID` int(11) DEFAULT NULL,
    PRIMARY KEY (`recordID`(20))
);

OPEN cur1;

read_loop: LOOP
    FETCH cur1 INTO binID;

    IF done THEN
        LEAVE read_loop;
    END IF;

    INSERT INTO TempResults (recordID, queryTerm, recordCreated, recordByID, recordByName, recordText, recordSource, rerecordCount, timecodeOffset, recordByImageURL, canDelete, heatmapBinID, timelineBinID)
    SELECT * FROM MEStressTest WHERE heatmapBinID = binID ORDER BY recordCreated DESC LIMIT numRecordsPerBin;
END LOOP;

CLOSE cur1;

SELECT * FROM TempResults ORDER BY heatmapBinID ASC, recordCreated DESC;

END


回答1:


Try to simulate ROW_NUMBER OVER PARTITION in MySQL: http://www.sqlfiddle.com/#!2/fd8b5/4

Given this data:

create table sentai(
  band varchar(50),
  member_name varchar(50),
  member_year int not null
);

insert into sentai(band, member_name, member_year) values
('BEATLES','JOHN',1960),
('BEATLES','PAUL',1961),
('BEATLES','GEORGE',1962),
('BEATLES','RINGO',1963),
('VOLTES V','STEVE',1970),
('VOLTES V','MARK',1971),
('VOLTES V','BIG BERT',1972),
('VOLTES V','LITTLE JOHN',1973),
('VOLTES V','JAMIE',1964),
('ERASERHEADS','ELY',1990),
('ERASERHEADS','RAYMUND',1991),
('ERASERHEADS','BUDDY',1992),
('ERASERHEADS','MARCUS',1993);

Object, find all three most recent members on each band.

First we have to put a row_number on each member based on most year(via descending order)

select *,

  @rn := @rn + 1 as rn
from (sentai s, (select @rn := 0) as vars)
order by s.band, s.member_year desc;

Output:

|        BAND | MEMBER_NAME | MEMBER_YEAR | @RN := 0 | RN |
|-------------|-------------|-------------|----------|----|
|     BEATLES |       RINGO |        1963 |        0 |  1 |
|     BEATLES |      GEORGE |        1962 |        0 |  2 |
|     BEATLES |        PAUL |        1961 |        0 |  3 |
|     BEATLES |        JOHN |        1960 |        0 |  4 |
| ERASERHEADS |      MARCUS |        1993 |        0 |  5 |
| ERASERHEADS |       BUDDY |        1992 |        0 |  6 |
| ERASERHEADS |     RAYMUND |        1991 |        0 |  7 |
| ERASERHEADS |         ELY |        1990 |        0 |  8 |
|    VOLTES V | LITTLE JOHN |        1973 |        0 |  9 |
|    VOLTES V |    BIG BERT |        1972 |        0 | 10 |
|    VOLTES V |        MARK |        1971 |        0 | 11 |
|    VOLTES V |       STEVE |        1970 |        0 | 12 |
|    VOLTES V |       JAMIE |        1964 |        0 | 13 |

Then we reset the row number when a member is on different band:

select *,

  @rn := IF(@pg = s.band, @rn + 1, 1) as rn,
  @pg := s.band
from (sentai s, (select @pg := null, @rn := 0) as vars)
order by s.band, s.member_year desc;

Output:

|        BAND | MEMBER_NAME | MEMBER_YEAR | @PG := NULL | @RN := 0 | RN | @PG := S.BAND |
|-------------|-------------|-------------|-------------|----------|----|---------------|
|     BEATLES |       RINGO |        1963 |      (null) |        0 |  1 |       BEATLES |
|     BEATLES |      GEORGE |        1962 |      (null) |        0 |  2 |       BEATLES |
|     BEATLES |        PAUL |        1961 |      (null) |        0 |  3 |       BEATLES |
|     BEATLES |        JOHN |        1960 |      (null) |        0 |  4 |       BEATLES |
| ERASERHEADS |      MARCUS |        1993 |      (null) |        0 |  1 |   ERASERHEADS |
| ERASERHEADS |       BUDDY |        1992 |      (null) |        0 |  2 |   ERASERHEADS |
| ERASERHEADS |     RAYMUND |        1991 |      (null) |        0 |  3 |   ERASERHEADS |
| ERASERHEADS |         ELY |        1990 |      (null) |        0 |  4 |   ERASERHEADS |
|    VOLTES V | LITTLE JOHN |        1973 |      (null) |        0 |  1 |      VOLTES V |
|    VOLTES V |    BIG BERT |        1972 |      (null) |        0 |  2 |      VOLTES V |
|    VOLTES V |        MARK |        1971 |      (null) |        0 |  3 |      VOLTES V |
|    VOLTES V |       STEVE |        1970 |      (null) |        0 |  4 |      VOLTES V |
|    VOLTES V |       JAMIE |        1964 |      (null) |        0 |  5 |      VOLTES V |

Then we select only the most recent three members on each band:

select x.band, x.member_name, x.member_year
from
(
  select *,
    @rn := IF(@pg = s.band, @rn + 1, 1) as rn,
    @pg := s.band
  from (sentai s, (select @pg := null, @rn := 0) as vars)
  order by s.band, s.member_year desc
) as x
where x.rn <= 3
order by x.band, x.member_year desc;

Output:

|        BAND | MEMBER_NAME | MEMBER_YEAR |
|-------------|-------------|-------------|
|     BEATLES |       RINGO |        1963 |
|     BEATLES |      GEORGE |        1962 |
|     BEATLES |        PAUL |        1961 |
| ERASERHEADS |      MARCUS |        1993 |
| ERASERHEADS |       BUDDY |        1992 |
| ERASERHEADS |     RAYMUND |        1991 |
|    VOLTES V | LITTLE JOHN |        1973 |
|    VOLTES V |    BIG BERT |        1972 |
|    VOLTES V |        MARK |        1971 |

While windowing function(e.g. ROW_NUMBER OVER PARTITION) is not yet available on MySQL, just simulate it with variables. Please let us know if this is faster than cursor approach


How it looks like on windowing-capable RDBMS: http://www.sqlfiddle.com/#!1/fd8b5/6

with member_recentness as
(
  select row_number() over each_band as recent, *
  from sentai
  window each_band as (partition by band order by member_year desc)
)
select * 
from member_recentness
where recent <= 3;

Output:

| RECENT |        BAND | MEMBER_NAME | MEMBER_YEAR |
|--------|-------------|-------------|-------------|
|      1 |     BEATLES |       RINGO |        1963 |
|      2 |     BEATLES |      GEORGE |        1962 |
|      3 |     BEATLES |        PAUL |        1961 |
|      1 | ERASERHEADS |      MARCUS |        1993 |
|      2 | ERASERHEADS |       BUDDY |        1992 |
|      3 | ERASERHEADS |     RAYMUND |        1991 |
|      1 |    VOLTES V | LITTLE JOHN |        1973 |
|      2 |    VOLTES V |    BIG BERT |        1972 |
|      3 |    VOLTES V |        MARK |        1971 |



回答2:


SELECT * FROM MyTable WHERE binID IN (SELECT DISTINCT(bin_id) FROM mysql_table) ORDER BY createdDate DESC LIMIT 10;

This is not tested, nevermind the syntax.

Add indexes to increase performance.




回答3:


If you try to inner join 2 tables without any joining key, it will be a cartesian product of the 2 tables, i.e.:

SELECT * 
FROM MyTable t 
    INNER JOIN (SELECT DISTINCT binId FROM MyTable) AS u 
WHERE 
    t.binID = theBinID 
ORDER BY t.createdDate DESC LIMIT 10

You may refer to this



来源:https://stackoverflow.com/questions/10762598/how-to-use-unique-column-values-as-input-into-another-select-statement

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!