window-functions

Cumulative adding with dynamic base in Postgres

為{幸葍}努か 提交于 2020-08-05 04:37:25
问题 I have the following scenario in Postgres (I'm using 9.4.1 ). I have a table of this format: create table test( id serial, val numeric not null, created timestamp not null default(current_timestamp), fk integer not null ); What I then have is a threshold numeric field in another table which should be used to label each row of test . For every value which is >= threshold I want to have that record marked as true but if it is true it should reset subsequent counts to 0 at that point, e.g. Data

MySQL show sum of difference of two values

倾然丶 夕夏残阳落幕 提交于 2020-07-16 05:48:12
问题 Below is my query. SELECT n.`name`,n.`customer_id`,m.`msn`, m.kwh, m.kwh - LAG(m.kwh) OVER(PARTITION BY n.`customer_id` ORDER BY m.`data_date_time`) AS kwh_diff FROM mdc_node n INNER JOIN `mdc_meters_data` m ON n.`customer_id` = m.`cust_id` WHERE n.`lft` = 5 AND n.`icon` NOT IN ('folder') AND m.`data_date_time` BETWEEN NOW() - INTERVAL 30 DAY AND NOW() Which gives me below result I want to sum up the kwh_diff and to show only one-row record not multiple like below name customer_id msn sum_kwh

Get the last element of a window in Spark 2.1.1

点点圈 提交于 2020-07-05 04:44:06
问题 I have a dataframe in which I have subcategories, and want the last element of each of these subcategories. val windowSpec = Window.partitionBy("name").orderBy("count") sqlContext .createDataFrame( Seq[(String, Int)]( ("A", 1), ("A", 2), ("A", 3), ("B", 10), ("B", 20), ("B", 30) )) .toDF("name", "count") .withColumn("firstCountOfName", first("count").over(windowSpec)) .withColumn("lastCountOfName", last("count").over(windowSpec)) .show() returns me something strange: +----+-----+-------------

Return duration of an item from its transactions, many to many, SQL

不问归期 提交于 2020-07-04 00:19:01
问题 Hopefully I can get some help on this. Situation There are two incoming stations and one outgoing station. Items are scanned in and out. I need to know how long an item was in the station. Let's consider 'in station' to be the time between it's incoming date scan and it's outgoing date scan. Problem An item can be (accidentally) scanned multiple times into either station (for this I was thinking of identifying if a scan was made the same day (not looking at hours) then return the earliest

Return duration of an item from its transactions, many to many, SQL

|▌冷眼眸甩不掉的悲伤 提交于 2020-07-04 00:18:12
问题 Hopefully I can get some help on this. Situation There are two incoming stations and one outgoing station. Items are scanned in and out. I need to know how long an item was in the station. Let's consider 'in station' to be the time between it's incoming date scan and it's outgoing date scan. Problem An item can be (accidentally) scanned multiple times into either station (for this I was thinking of identifying if a scan was made the same day (not looking at hours) then return the earliest

Unexpected data at typical recursion

醉酒当歌 提交于 2020-06-29 03:53:08
问题 It's hard for me to use words to describe this, so here's the sample: select * into t from (values (10, 'A'), (25, 'B'), (30, 'C'), (45, 'D'), (52, 'E'), (61, 'F'), (61, 'G'), (61, 'H'), (79, 'I'), (82, 'J') ) v(userid, name) Notice how F,G and H have the same userid. Now, consider the following recursive query: with tn as ( select t.userId,t.name, row_number() over (order by userid,newid()) as seqnum from t ), cte as ( select userId, name, seqnum as seqnum from tn where seqnum = 1 union all

Select only rows that has a column changed from the rows before it, given an unique ID

ぐ巨炮叔叔 提交于 2020-06-27 16:57:50
问题 I have a postgreSQL database where I want to record how a specific column changes for each id, over time. Table1: personID | status | unixtime | column d | column e | column f 1 2 213214 x y z 1 2 213325 x y z 1 2 213326 x y z 1 2 213327 x y z 1 2 213328 x y z 1 3 214330 x y z 1 3 214331 x y z 1 3 214332 x y z 1 2 324543 x y z I want to track all the of status over time. So based on this I want a new table, table2 with the following data: personID | status | unixtime | column d | column e |

Apply OFFSET and LIMIT in ORACLE for complex Join Queries?

♀尐吖头ヾ 提交于 2020-06-26 12:16:40
问题 I'm using Oracle 11g and have a complex join query. In this query I really wanted to apply OFFSET and LIMIT in order to be get used in Spring Batch Framework effectively. I went through: How do I limit the number of rows returned by an Oracle query after ordering? and Alternatives to LIMIT and OFFSET for paging in Oracle But things are not very clear to me. My Query SELECT DEPT.ID rowobjid, DEPT.CREATOR createdby, DEPT.CREATE_DATE createddate, DEPT.UPDATED_BY updatedby, DEPT.LAST_UPDATE_DATE

Finding Percentile in Spark-Scala per a group

吃可爱长大的小学妹 提交于 2020-06-20 15:34:33
问题 I am trying to do a percentile over a column using a Window function as below. I have referred here to use the ApproxQuantile definition over a group. val df1 = Seq( (1, 10.0), (1, 20.0), (1, 40.6), (1, 15.6), (1, 17.6), (1, 25.6), (1, 39.6), (2, 20.5), (2 ,70.3), (2, 69.4), (2, 74.4), (2, 45.4), (3, 60.6), (3, 80.6), (4, 30.6), (4, 90.6) ).toDF("ID","Count") val idBucketMapping = Seq((1, 4), (2, 3), (3, 2), (4, 2)) .toDF("ID", "Bucket") //jpp import org.apache.spark.sql.Column import org

Finding Percentile in Spark-Scala per a group

孤人 提交于 2020-06-20 15:31:57
问题 I am trying to do a percentile over a column using a Window function as below. I have referred here to use the ApproxQuantile definition over a group. val df1 = Seq( (1, 10.0), (1, 20.0), (1, 40.6), (1, 15.6), (1, 17.6), (1, 25.6), (1, 39.6), (2, 20.5), (2 ,70.3), (2, 69.4), (2, 74.4), (2, 45.4), (3, 60.6), (3, 80.6), (4, 30.6), (4, 90.6) ).toDF("ID","Count") val idBucketMapping = Seq((1, 4), (2, 3), (3, 2), (4, 2)) .toDF("ID", "Bucket") //jpp import org.apache.spark.sql.Column import org