aggregate-functions | 易学教程

Create two arrays for two fields, keeping sort order of arrays in sync (without subquery)

阅读更多关于 Create two arrays for two fields, keeping sort order of arrays in sync (without subquery)

问题 There is no rhyme or reason for this question other than I was curious about how one would go about doing this. Platform: while I was hoping for a SQL-Standard solution, my main concentration is with PostgreSQL 8.4+ . (I know 9.0+ has some array sorting functions.) SELECT id, group, dt FROM foo ORDER BY id; id | group | dt -------+-------+----------- 1 | foo | 2012-01-01 1 | bar | 2012-01-03 1 | baz | 2012-01-02 2 | foo | 2012-01-01 3 | bar | 2012-01-01 4 | bar | 2012-01-01 4 | baz | 2012-01

SparkSQL: conditional sum using two columns

阅读更多关于 SparkSQL: conditional sum using two columns

问题 I hope you can help me with this. I have a DF as follows: val df = sc.parallelize(Seq( (1, "a", "2014-12-01", "2015-01-01", 100), (2, "a", "2014-12-01", "2015-01-02", 150), (3, "a", "2014-12-01", "2015-01-03", 120), (4, "b", "2015-12-15", "2015-01-01", 100) )).toDF("id", "prodId", "dateIns", "dateTrans", "value") .withColumn("dateIns", to_date($"dateIns") .withColumn("dateTrans", to_date($"dateTrans")) I would love to do a groupBy prodId and aggregate 'value' summing it for ranges of dates

SQL select query using joins, group by and aggregate functions

阅读更多关于 SQL select query using joins, group by and aggregate functions

问题 I have two tables with the following fields emp_table: emp_id, emp_name salary_increase: emp_id, inc_date, inc_amount I am required to write a query which gives the employee details, the number of times an employee has received a salary increase, the value of the maximum increase amount and the date of that increase. Here is what i have so far: SELECT e.*, count(i.inc_amount), max(i.inc_amount) FROM salary_increase AS i RIGHT JOIN emp_table AS e ON i.emp_id=e.emp_id GROUP BY e.emp_id; this

Group/Count list of dictionaries based on value

阅读更多关于 Group/Count list of dictionaries based on value

问题 I've got a list of Tokens which looks something like: [{ Value: "Blah", StartOffset: 0, EndOffset: 4 }, ... ] What I want to do is get a count of how many times each value occurs in the list of tokens. In VB.Net I'd do something like... Tokens = Tokens. GroupBy(Function(x) x.Value). Select(Function(g) New With { .Value = g.Key, .Count = g.Count}) What's the equivalent in Python? 回答1: IIUC, you can use collections.Counter : >>> from collections import Counter >>> tokens = [{"Value": "Blah",

Pairwise array sum aggregate function?

阅读更多关于 Pairwise array sum aggregate function?

问题 I have a table with arrays as one column, and I want to sum the array elements together: > create table regres(a int[] not null); > insert into regres values ('{1,2,3}'), ('{9, 12, 13}'); > select * from regres; a ----------- {1,2,3} {9,12,13} I want the result to be: {10, 14, 16} that is: {1 + 9, 2 + 12, 3 + 13} . Does such a function already exist somewhere? The intagg extension looked like a good candidate, but such a function does not already exist. The arrays are expected to be between

Pairwise array sum aggregate function?

阅读更多关于 Pairwise array sum aggregate function?

How to use a SQL window function to calculate a percentage of an aggregate

阅读更多关于 How to use a SQL window function to calculate a percentage of an aggregate

问题 I need to calculate percentages of various dimensions in a table. I'd like to simplify things by using window functions to calculate the denominator, however I am having an issue because the numerator has to be an aggregate as well. As a simple example, take the following table: create temp table test (d1 text, d2 text, v numeric); insert into test values ('a','x',5), ('a','y',5), ('a','y',10), ('b','x',20); If I just want to calculate the share of each individual row out of d1, then

Explain R tapply description

阅读更多关于 Explain R tapply description

问题 I understand what tapply() does in R. However, I cannot parse this description of it from the documentaion: Apply a Function Over a "Ragged" Array Description: Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors. Usage: tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE) When I think of tapply, I think of group by in sql. You group values in X together by its parallel factor levels in INDEX

How to calculate Mean by Date Grouped as Fiscal Quarters

阅读更多关于 How to calculate Mean by Date Grouped as Fiscal Quarters

问题 I have the following table: Date Country Class Value 6/1/2010 USA A 45 6/1/2010 Canada A 23 6/1/2010 Brazil B 65 9/1/2010 USA B 47 9/1/2010 Canada A 98 9/1/2010 Brazil B 25 12/1/2010 USA B 14 12/1/2010 Canada A 79 12/1/2010 Brazil A 23 3/1/2011 USA A 84 3/1/2011 Canada B 77 3/1/2011 Brazil A 43 6/1/2011 USA A 45 6/1/2011 Canada A 23 6/1/2011 Brazil B 65 9/1/2011 USA B 47 9/1/2011 Canada A 98 9/1/2011 Brazil B 25 12/1/2011 USA B 14 12/1/2011 Canada A 79 12/1/2011 Brazil A 23 3/1/2012 USA A 84

Count matches between multiple columns and words in a nested array

阅读更多关于 Count matches between multiple columns and words in a nested array

问题 My earlier question was resolved. Now I need to develop a related, but more complex query. I have a table like this: id description additional_info ------------------------------------------- 123 games XYD 124 Festivals sport swim And I need to count matches to arrays like this: array_content varchar[] := {"Festivals,games","sport,swim"} If either of the columns description and additional_info contains any of the tags separated by a comma, we count that as 1. So each array element (consisting