aggregate-functions

Create two arrays for two fields, keeping sort order of arrays in sync (without subquery)

流过昼夜 提交于 2019-12-21 06:27:20
问题 There is no rhyme or reason for this question other than I was curious about how one would go about doing this. Platform: while I was hoping for a SQL-Standard solution, my main concentration is with PostgreSQL 8.4+ . (I know 9.0+ has some array sorting functions.) SELECT id, group, dt FROM foo ORDER BY id; id | group | dt -------+-------+----------- 1 | foo | 2012-01-01 1 | bar | 2012-01-03 1 | baz | 2012-01-02 2 | foo | 2012-01-01 3 | bar | 2012-01-01 4 | bar | 2012-01-01 4 | baz | 2012-01

SparkSQL: conditional sum using two columns

和自甴很熟 提交于 2019-12-21 05:35:31
问题 I hope you can help me with this. I have a DF as follows: val df = sc.parallelize(Seq( (1, "a", "2014-12-01", "2015-01-01", 100), (2, "a", "2014-12-01", "2015-01-02", 150), (3, "a", "2014-12-01", "2015-01-03", 120), (4, "b", "2015-12-15", "2015-01-01", 100) )).toDF("id", "prodId", "dateIns", "dateTrans", "value") .withColumn("dateIns", to_date($"dateIns") .withColumn("dateTrans", to_date($"dateTrans")) I would love to do a groupBy prodId and aggregate 'value' summing it for ranges of dates

SQL select query using joins, group by and aggregate functions

﹥>﹥吖頭↗ 提交于 2019-12-21 05:21:33
问题 I have two tables with the following fields emp_table: emp_id, emp_name salary_increase: emp_id, inc_date, inc_amount I am required to write a query which gives the employee details, the number of times an employee has received a salary increase, the value of the maximum increase amount and the date of that increase. Here is what i have so far: SELECT e.*, count(i.inc_amount), max(i.inc_amount) FROM salary_increase AS i RIGHT JOIN emp_table AS e ON i.emp_id=e.emp_id GROUP BY e.emp_id; this

Group/Count list of dictionaries based on value

廉价感情. 提交于 2019-12-21 04:37:09
问题 I've got a list of Tokens which looks something like: [{ Value: "Blah", StartOffset: 0, EndOffset: 4 }, ... ] What I want to do is get a count of how many times each value occurs in the list of tokens. In VB.Net I'd do something like... Tokens = Tokens. GroupBy(Function(x) x.Value). Select(Function(g) New With { .Value = g.Key, .Count = g.Count}) What's the equivalent in Python? 回答1: IIUC, you can use collections.Counter : >>> from collections import Counter >>> tokens = [{"Value": "Blah",

Pairwise array sum aggregate function?

大城市里の小女人 提交于 2019-12-20 21:01:48
问题 I have a table with arrays as one column, and I want to sum the array elements together: > create table regres(a int[] not null); > insert into regres values ('{1,2,3}'), ('{9, 12, 13}'); > select * from regres; a ----------- {1,2,3} {9,12,13} I want the result to be: {10, 14, 16} that is: {1 + 9, 2 + 12, 3 + 13} . Does such a function already exist somewhere? The intagg extension looked like a good candidate, but such a function does not already exist. The arrays are expected to be between

Pairwise array sum aggregate function?

佐手、 提交于 2019-12-20 21:01:16
问题 I have a table with arrays as one column, and I want to sum the array elements together: > create table regres(a int[] not null); > insert into regres values ('{1,2,3}'), ('{9, 12, 13}'); > select * from regres; a ----------- {1,2,3} {9,12,13} I want the result to be: {10, 14, 16} that is: {1 + 9, 2 + 12, 3 + 13} . Does such a function already exist somewhere? The intagg extension looked like a good candidate, but such a function does not already exist. The arrays are expected to be between

How to use a SQL window function to calculate a percentage of an aggregate

人盡茶涼 提交于 2019-12-20 09:19:41
问题 I need to calculate percentages of various dimensions in a table. I'd like to simplify things by using window functions to calculate the denominator, however I am having an issue because the numerator has to be an aggregate as well. As a simple example, take the following table: create temp table test (d1 text, d2 text, v numeric); insert into test values ('a','x',5), ('a','y',5), ('a','y',10), ('b','x',20); If I just want to calculate the share of each individual row out of d1, then

Explain R tapply description

杀马特。学长 韩版系。学妹 提交于 2019-12-20 08:37:31
问题 I understand what tapply() does in R. However, I cannot parse this description of it from the documentaion: Apply a Function Over a "Ragged" Array Description: Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors. Usage: tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE) When I think of tapply, I think of group by in sql. You group values in X together by its parallel factor levels in INDEX

How to calculate Mean by Date Grouped as Fiscal Quarters

半腔热情 提交于 2019-12-20 07:29:16
问题 I have the following table: Date Country Class Value 6/1/2010 USA A 45 6/1/2010 Canada A 23 6/1/2010 Brazil B 65 9/1/2010 USA B 47 9/1/2010 Canada A 98 9/1/2010 Brazil B 25 12/1/2010 USA B 14 12/1/2010 Canada A 79 12/1/2010 Brazil A 23 3/1/2011 USA A 84 3/1/2011 Canada B 77 3/1/2011 Brazil A 43 6/1/2011 USA A 45 6/1/2011 Canada A 23 6/1/2011 Brazil B 65 9/1/2011 USA B 47 9/1/2011 Canada A 98 9/1/2011 Brazil B 25 12/1/2011 USA B 14 12/1/2011 Canada A 79 12/1/2011 Brazil A 23 3/1/2012 USA A 84

Count matches between multiple columns and words in a nested array

假如想象 提交于 2019-12-20 06:37:03
问题 My earlier question was resolved. Now I need to develop a related, but more complex query. I have a table like this: id description additional_info ------------------------------------------- 123 games XYD 124 Festivals sport swim And I need to count matches to arrays like this: array_content varchar[] := {"Festivals,games","sport,swim"} If either of the columns description and additional_info contains any of the tags separated by a comma, we count that as 1. So each array element (consisting