aggregate-functions

Group DateTime into 5,15,30 and 60 minute intervals

亡梦爱人 提交于 2019-11-26 17:46:26
问题 I am trying to group some records into 5-, 15-, 30- and 60-minute intervals: SELECT AVG(value) as "AvgValue", sample_date/(5*60) as "TimeFive" FROM DATA WHERE id = 123 AND sample_date >= 3/21/2012 i want to run several queries, each would group my average values into the desired time increments. So the 5-min query would return results like this: AvgValue TimeFive 6.90 1995-01-01 00:05:00 7.15 1995-01-01 00:10:00 8.25 1995-01-01 00:15:00 The 30-min query would result in this: AvgValue

How to define a custom aggregation function to sum a column of Vectors?

好久不见. 提交于 2019-11-26 17:36:21
I have a DataFrame of two columns, ID of type Int and Vec of type Vector ( org.apache.spark.mllib.linalg.Vector ). The DataFrame looks like follow: ID,Vec 1,[0,0,5] 1,[4,0,1] 1,[1,2,1] 2,[7,5,0] 2,[3,3,4] 3,[0,8,1] 3,[0,0,1] 3,[7,7,7] .... I would like to do a groupBy($"ID") then apply an aggregation on the rows inside each group by summing the vectors. The desired output of the above example would be: ID,SumOfVectors 1,[5,2,7] 2,[10,8,4] 3,[7,15,9] ... The available aggregation functions will not work, e.g. df.groupBy($"ID").agg(sum($"Vec") will lead to an ClassCastException. How to implement

Naming returned columns in Pandas aggregate function? [duplicate]

你说的曾经没有我的故事 提交于 2019-11-26 17:10:58
This question already has an answer here: Multiple aggregations of the same column using pandas GroupBy.agg() 3 answers I'm having trouble with Pandas' groupby functionality. I've read the documentation , but I can't see to figure out how to apply aggregate functions to multiple columns and have custom names for those columns. This comes very close, but the data structure returned has nested column headings: data.groupby("Country").agg( {"column1": {"foo": sum()}, "column2": {"mean": np.mean, "std": np.std}}) (ie. I want to take the mean and std of column2, but return those columns as "mean"

Initial array in function to aggregate multi-dimensional array

我的未来我决定 提交于 2019-11-26 16:38:29
问题 I have a table with arrays of integer. I want to create an aggregate function that will return a 2-dimensional array with all the rows together. It then gets passed to plr to do some maths on it. I have: CREATE OR REPLACE FUNCTION arrayappend(left int[][], right int[]) RETURNS int[] AS $BODY$ SELECT $1 || $2 ; $BODY$ LANGUAGE SQL; and: CREATE AGGREGATE array_sum2 (int[]) ( SFUNC = arrayappend, STYPE = int[][], INITCOND = '{}' ); But the return type is int[] , not int[][] ? How can I

GROUP BY + CASE statement

ⅰ亾dé卋堺 提交于 2019-11-26 16:35:54
I have a working query that is grouping data by hardware model and a result, but the problem is there are many "results" . I have tried to reduce that down to "if result = 0 then keep as 0, else set it to 1" . This generally works, but I end up having: day | name | type | case | count ------------+----------------+------+------+------- 2013-11-06 | modelA | 1 | 0 | 972 2013-11-06 | modelA | 1 | 1 | 42 2013-11-06 | modelA | 1 | 1 | 2 2013-11-06 | modelA | 1 | 1 | 11 2013-11-06 | modelB | 1 | 0 | 456 2013-11-06 | modelB | 1 | 1 | 16 2013-11-06 | modelB | 1 | 1 | 8 2013-11-06 | modelB | 3 | 0 |

How to include “zero” / “0” results in COUNT aggregate?

我与影子孤独终老i 提交于 2019-11-26 16:06:59
I've just got myself a little bit stuck with some SQL. I don't think I can phrase the question brilliantly - so let me show you. I have two tables, one called person, one called appointment. I'm trying to return the number of appointments a person has (including if they have zero). Appointment contains the person_id and there is a person_id per appointment. So COUNT(person_id) is a sensible approach. The query: SELECT person_id, COUNT(person_id) AS "number_of_appointments" FROM appointment GROUP BY person_id; Will return correctly, the number of appointments a person_id has. However, a person

Aggregate SQL Function to grab only the first from each group

两盒软妹~` 提交于 2019-11-26 15:47:55
问题 I have 2 tables - an Account table and a Users table. Each account can have multiple users. I have a scenario where I want to execute a single query/join against these two tables, but I want all the Account data (Account.*) and only the first set of user data (specifically their name). Instead of doing a "min" or "max" on my aggregated group, I wanted to do a "first". But, apparently, there is no "First" aggregate function in TSQL. Any suggestions on how to go about getting this query?

Is there ANY_VALUE capability for mysql 5.6?

别等时光非礼了梦想. 提交于 2019-11-26 15:31:16
currently im working with mysql 5.7 in development, and 5.6 in production. Each time i run a query with a group by in development i get some error like "Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY " Here is the query. SELECT c.id, c.name, i.* FROM countries c, images i WHERE i.country_id = c.id GROUP BY c.id; Fixed for 5.7; SELECT c.id, c.name, ANY_VALUE(i.url) url, ANY_VALUE(i.lat) lat, ANY_VALUE(i.lng) lng FROM countries c, images i WHERE i.country_id = c.id GROUP BY c.id; For solving that I use the mysql function from 5.7 ANY_VALUE, but the main issue is that its not

MySQL dynamic cross tab

北战南征 提交于 2019-11-26 14:46:17
问题 I have a table like this: way stop time 1 1 00:55 1 2 01:01 1 3 01:07 2 2 01:41 2 3 01:47 2 5 01:49 3 1 04:00 3 2 04:06 3 3 04:12 and I want a table like this: stop way_1 way_2 way_3 (way_n) 1 00:55 04:00 2 01:01 01:41 04:06 3 01:07 01:47 04:12 5 01:49 There are many solutions online about MySQL cross tab (pivot table), but how can I do this if I don't know how many "way" are there? Thanks 回答1: The number and names of columns must be fixed at the time you prepare the query. That's just the

LINQ aggregate and group by periods of time

拜拜、爱过 提交于 2019-11-26 12:25:14
问题 I\'m trying to understand how LINQ can be used to group data by intervals of time; and then ideally aggregate each group. Finding numerous examples with explicit date ranges, I\'m trying to group by periods such as 5-minutes, 1-hour, 1-day. For example, I have a class that wraps a DateTime with a value: public class Sample { public DateTime timestamp; public double value; } These observations are contained as a series in a List collection: List<Sample> series; So, to group by hourly periods