aggregate

Cumulative adding with dynamic base in Postgres

為{幸葍}努か 提交于 2020-08-05 04:37:25
问题 I have the following scenario in Postgres (I'm using 9.4.1 ). I have a table of this format: create table test( id serial, val numeric not null, created timestamp not null default(current_timestamp), fk integer not null ); What I then have is a threshold numeric field in another table which should be used to label each row of test . For every value which is >= threshold I want to have that record marked as true but if it is true it should reset subsequent counts to 0 at that point, e.g. Data

join datasets with different dimensions - how to aggregate data properly

ⅰ亾dé卋堺 提交于 2020-08-03 06:16:16
问题 I am working on a complex logic where I need to redistribute a quantity from one dataset to another dataset. This questions is a continuation of this question In the example below I am introducing several new dimensions. After aggregating and distributing all the quantities I am expecting the same total quantity however I have some differences. See the example below package playground import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.SparkSession import org.apache.spark.sql

create pivot table with aggregates without a join

自作多情 提交于 2020-07-22 21:38:59
问题 I think I am trying to do something that cannot be done. I am trying to create a pivot table, simultaneously doing two pivots by aggregating off two different columns. I have created a much simplified example to make the point more understandable. CREATE TABLE two_aggregate_pivot ( ID INT, category CHAR(1), value INT ) INSERT INTO dbo.two_aggregate_pivot ( ID, category, value ) VALUES (1, 'A', 100), (1, 'B', 97), (1, 'D', NULL), (2, 'A', 86), (2, 'C', 83), (2, 'D', 81) I can pivot to get the

create pivot table with aggregates without a join

ε祈祈猫儿з 提交于 2020-07-22 21:37:16
问题 I think I am trying to do something that cannot be done. I am trying to create a pivot table, simultaneously doing two pivots by aggregating off two different columns. I have created a much simplified example to make the point more understandable. CREATE TABLE two_aggregate_pivot ( ID INT, category CHAR(1), value INT ) INSERT INTO dbo.two_aggregate_pivot ( ID, category, value ) VALUES (1, 'A', 100), (1, 'B', 97), (1, 'D', NULL), (2, 'A', 86), (2, 'C', 83), (2, 'D', 81) I can pivot to get the

create pivot table with aggregates without a join

流过昼夜 提交于 2020-07-22 21:37:04
问题 I think I am trying to do something that cannot be done. I am trying to create a pivot table, simultaneously doing two pivots by aggregating off two different columns. I have created a much simplified example to make the point more understandable. CREATE TABLE two_aggregate_pivot ( ID INT, category CHAR(1), value INT ) INSERT INTO dbo.two_aggregate_pivot ( ID, category, value ) VALUES (1, 'A', 100), (1, 'B', 97), (1, 'D', NULL), (2, 'A', 86), (2, 'C', 83), (2, 'D', 81) I can pivot to get the

How do you aggregate rows to a factor variable with three levels?

老子叫甜甜 提交于 2020-07-22 21:33:11
问题 I have a dataset where some participants have multiple rows and I need to aggregate the data in a way that every participant has only one row. The dataset contains different variable types (e.g., factors, date, age etc.) I have made a code that works and looks like this: example4 <- SMARTdata_50j_diagc_2016 %>% group_by( Patient_Id ) %>% summarise( Groep = first( Groep ), Ziekenhuis_Nr = first( Ziekenhuis_Nr ), Ziekenhuistype = first( Ziekenhuistype ), aantalDBC = n(), aantalVervolg = sum( as

Avoid nested aggregate error using coalesce()

浪子不回头ぞ 提交于 2020-07-03 13:41:27
问题 I currently have a query using coalesce that worked in SQL server,however, it is not working in Amazon Redshift. Is there a way I can more appropriately write this to use in Redshift: coalesce(sum(Score)/nullif(sum(ScorePrem),0),0) as percent 回答1: Consider running the aggregate query as a subquery or CTE, then handle transformation or secondary calculations in an outer main query. WITH agg AS ( SELECT calendar_month_id ,day_of_month ,month_name ,DaysRemaining ,RPTBRANCH ,0 AS TotalGrp ,SUM

PostgreSQL: subgrouping column based on intervals

ⅰ亾dé卋堺 提交于 2020-06-27 16:23:18
问题 I have the following tables: SELECT * FROM trajectories LIMIT 10; user_id | session_id | timestamp | lat | lon | alt ---------+-------------------+------------------------+-----------+------------+------ 11 | 10020071017220238 | 2007-10-18 02:51:38+01 | 37.780927 | 113.677553 | 2160 11 | 10020071017220238 | 2007-10-18 02:51:39+01 | 37.78093 | 113.677627 | 2160 11 | 10020071017220238 | 2007-10-18 02:51:40+01 | 37.780932 | 113.677698 | 2160 11 | 10020071017220238 | 2007-10-18 02:51:41+01 | 37

Inconsistency of na.action between xtabs and aggregate in R

本小妞迷上赌 提交于 2020-06-27 09:15:58
问题 I have the following data.frame: x <- data.frame(A = c("Y", "Y", "Z", NA), B = c(NA, TRUE, FALSE, TRUE), C = c(TRUE, TRUE, NA, FALSE)) And I need to compute the following table with xtabs : A B C Y 1 2 Z 0 0 <NA> 1 0 I was told to use na.action = NULL, which indeed returns the table I need: xtabs(formula = cbind(B, C) ~ A, data = x, addNA = TRUE, na.action = NULL) A B C Y 1 2 Z 0 0 <NA> 1 0 However, na.action = na.pass returns a different table: xtabs(formula = cbind(B, C) ~ A, data = x,

Inconsistency of na.action between xtabs and aggregate in R

旧时模样 提交于 2020-06-27 09:15:08
问题 I have the following data.frame: x <- data.frame(A = c("Y", "Y", "Z", NA), B = c(NA, TRUE, FALSE, TRUE), C = c(TRUE, TRUE, NA, FALSE)) And I need to compute the following table with xtabs : A B C Y 1 2 Z 0 0 <NA> 1 0 I was told to use na.action = NULL, which indeed returns the table I need: xtabs(formula = cbind(B, C) ~ A, data = x, addNA = TRUE, na.action = NULL) A B C Y 1 2 Z 0 0 <NA> 1 0 However, na.action = na.pass returns a different table: xtabs(formula = cbind(B, C) ~ A, data = x,