Working out the SQL to query a priority queue table

I am implementing a small queue to handle which process gets to run first. I am using a table in a database to do this. Here is the structure of the table (I'm mocking it up in SQLite):

        "id" INTEGER PRIMARY KEY  AUTOINCREMENT  NOT NULL ,
        "identifier" VARCHAR NOT NULL ,
        "priority_number" INTEGER DEFAULT 15,
        "timestamp" DATETIME DEFAULT CURRENT_TIMESTAMP,
        "description" VARCHAR

I am trying to write SQL to give me the row of which process can run next. Here is some sample data:

id  identifier  priority_number timestamp   description
1   test1   15  2009-01-20 17:14:49 NULL
2   test2   15  2009-01-20 17:14:56 NULL
3   test3   10  2009-01-20 17:15:03 NULL
4   test4   15  2009-01-20 17:15:08 NULL
5   test5   15  2009-01-20 17:32:23 NULL
6   test6   14  2009-01-20 17:32:30 NULL
7   test7   7   2009-01-20 17:32:38 NULL
8   test8   20  2009-01-20 17:32:57 NULL
9   test9   7   2009-01-21 13:47:30 NULL
10  test10  15  2009-01-21 13:50:52 NULL

If I use this SQL, I can get the data in the proper order:

select * from queue_manager order by priority_number, timestamp;

This will give me the item with the lowest priority number (most important) at the top, and in those priority numbers, the earliest into the queue (by timestamp) at the top.

I could run this query, and only take the first row, but I would rather do this with a SQL query that would give me the one row of the process that is at the top of the queue (in the example data above, the row with id=7).

I tried doing self joins and sub queries, but I must be having a mental block - I just can't seem to get it right.

Thanks in advance!

EDIT

I forgot to mention that I am looking for a database-independent query. I am mocking this up in SQlite, but there is a good possibility I will implement this in DB2 or Oracle. I had thought to use a "limit 1" type operator on my query, but that is different between different database engines.

See if this works:

select * from queue_manager where priority_number = 
(select min(priority_number) from queue_manager) and  
timestamp = (select min(timestamp) 
from queue_manager qm2 
where qm2.priority_number = queue_manager.priority_number)

select * from queue_manager order by priority_number, timestamp LIMIT 1;

As for such called "database independency", it's a myth for most real world tasks. As a rule, you cannot even create schema in database-independent way.

If you want it to be 'concurrent safe' on something like InnoDB do:

1) Add an 'in_progress' field.

2) Turn off AUTOCommit

3) SELECT * FROM queue_manager where in_progress = 0 order by priority_number, timestamp LIMIT 1 FOR UDPATE;

4) UPDATE queue_manager SET in_progress = 1 where id = X;

5) COMMIT

6) Do the job. Then delete the row when its done to satisfaction. Have a 'master process' handle/redelegate/clean up old 'in_progress' jobs.

The best way to do this is database dependent; it's a much simpler thing to have different retrieval procs for the different target DBMSs versus all of the overhead of cursors or other constructs.

Selecting a limited number of rows is done differently in different flavors of SQL, so depending on which you are using there might be a built in way to do it. For example, in MS SQL Server:

SELECT TOP 1
     identifier,
     priority_number,
     timestamp,
     description
FROM
     dbo.Queue_Manager
ORDER BY
     priority_number,
     timestamp

To do this in ANSI compatible SQL, the following methods should work:

    SELECT
         QM1.identifier,
         QM1.priority_number,
         QM1.timestamp,
         QM1.description
    FROM
         Queue_Manager QM1
    LEFT OUTER JOIN Queue_Manager QM2 ON
         QM2.priority_number < QM1.priority_number OR
         (QM2.priority_number = QM1.priority_number AND QM2.timestamp < QM1.timestamp)
    /* If you're concerned that there might be an exact match by priority_number
and timestamp then you might want to add a bit more to the join */
    WHERE
         QM2.identifier IS NULL

Or you can try:

SELECT
     QM1.identifier,
     QM1.priority_number,
     QM1.timestamp,
     QM1.description
FROM
     Queue_Manager QM1
INNER JOIN
     (
          SELECT
               priority_number
               MIN(timestamp) AS timestamp,
          FROM
               Queue_Manager
          WHERE
               priority_number = 
                    (
                         SELECT
                              MIN(priority_number)
                         FROM
                              Queue_Manager
                    )
          GROUP BY
               priority_number
     ) SQ1 ON
          SQ1.priority_number = QM1.priority_number AND
          SQ1.timestamp = QM1.timestamp

Neither method accounts for exact matches in BOTH priority_number and timestamp, so if you think that's possible (and maybe even if you don't) you'll need to add a line or two to go one more level using the identifier or something else that guarantees uniqueness. Or just write your front end to handle the occasional case of getting back two rows (maybe just ignore the second - you'll get it the next time through).

Test each method and see which works better for you.

Also, how large do you expect the queue to get? It could be reasonable to just query with your ORDER BY and only have the front end retrieve the first row.

Read this section and select the variant that gives you the most suitable compatibility. Probably the use of cursors is the only more or less universally compatible way, but has some performance penalty that might not make it worth it (profile!).

Relational databases are not great at managing queues.

Try looking at MSMQ in the windows world, ActiveMQ in the java world or Websphere MQ in the business world.

These products do a single thing, manage queues, but they do it well.

来源：https://stackoverflow.com/questions/465692/working-out-the-sql-to-query-a-priority-queue-table

标签

sql

queue

priority-queue

queue-table