How can I select rows with most recent timestamp for each key value?

烈酒焚心 提交于 2019-11-27 17:23:03

For the sake of completeness, here's another possible solution:

SELECT sensorID,timestamp,sensorField1,sensorField2 
FROM sensorTable s1
WHERE timestamp = (SELECT MAX(timestamp) FROM sensorTable s2 WHERE s1.sensorID = s2.sensorID)
ORDER BY sensorID, timestamp;

Pretty self-explaining I think, but here's more info if you wish, as well as other examples. It's from the MySQL manual, but above query works with every RDBMS (implementing the sql'92 standard).

This can de done in a relatively elegant way using SELECT DISTINCT, as follows:

SELECT DISTINCT ON (sensorID)
sensorID, timestamp, sensorField1, sensorField2 
FROM sensorTable
ORDER BY sensorID, timestamp DESC;

The above works for PostgreSQL (some more info here) but I think also other engines. In case it's not obvious, what this does is sort the table by sensor ID and timestamp (newest to oldest), and then returns the first row (i.e. latest timestamp) for each unique sensor ID.

In my use case I have ~10M readings from ~1K sensors, so trying to join the table with itself on a timestamp-based filter is very resource-intensive; the above takes a couple of seconds.

You can join the table with itself (on sensor id), and add left.timestamp < right.timestamp as join condition. Then you pick the rows, where right.id is null. Voila, you got the latest entry per sensor.

http://sqlfiddle.com/#!9/45147/37

SELECT L.* FROM sensorTable L
LEFT JOIN sensorTable R ON
L.sensorID = R.sensorID AND
L.timestamp < R.timestamp
WHERE isnull (R.sensorID)

But please note, that this will be very resource intensive if you have a little amount of ids and many values! So, I wouldn't recommend this for some sort of Measuring-Stuff, where each Sensor collects a value every minute. However in a Use-Case, where you need to track "Revisions" of something that changes just "sometimes", it's easy going.

You can only select columns that are in the group or used in an aggregate function. You can use a join to get this working

select s1.* 
from sensorTable s1
inner join 
(
  SELECT sensorID, max(timestamp) as mts
  FROM sensorTable 
  GROUP BY sensorID 
) s2 on s2.sensorID = s1.sensorID and s1.timestamp = s2.mts
WITH SensorTimes As (
   SELECT sensorID, MAX(timestamp) "LastReading"
   FROM sensorTable
   GROUP BY sensorID
)
SELECT s.sensorID,s.timestamp,s.sensorField1,s.sensorField2 
FROM sensorTable s
INNER JOIN SensorTimes t on s.sensorID = t.sensorID and s.timestamp = t.LastReading

as @fancyPants answered

SELECT sensorID,timestamp,sensorField1,sensorField2 
FROM sensorTable stmt_outer
WHERE timestamp = (SELECT MAX(timestamp) FROM sensorTable stmt_inner WHERE outer.sensorID = inner.sensorID)

this is called Correlated Subqueries and is different from the normal nested subqueries
i.e: Each subquery is executed once for every row of the outer query.
This means that the inner sub-query:

(SELECT MAX(timestamp) FROM sensorTable inner WHERE outer.sensorID = inner.sensorID)

is going to be executed for each row, resulting in column contains the max(timestamp) which then is compared with the outer column to select only one distinct sensor_id of the outer statement

I had mostly the same problem and ended up a a different solution that makes this type of problem trivial to query.

I have a table of sensor data (1 minute data from about 30 sensors)

SensorReadings->(timestamp,value,idSensor)

and I have a sensor table that has lots of mostly static stuff about the sensor but the relevant fields are these:

Sensors->(idSensor,Description,tvLastUpdate,tvLastValue,...)

The tvLastupdate and tvLastValue are set in a trigger on inserts to the SensorReadings table. I always have direct access to these values without needing to do any expensive queries. This does denormalize slightly. The query is trivial:

SELECT idSensor,Description,tvLastUpdate,tvLastValue 
FROM Sensors

I use this method for data that is queried often. In my case I have a sensor table, and a large event table, that have data coming in at the minute level AND dozens of machines are updating dashboards and graphs with that data. With my data scenario the trigger-and-cache method works well.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!