aggregate-functions

SQL - Subquery in Aggregate Function

▼魔方 西西 提交于 2019-12-04 08:46:14
问题 I'm using the northwind database to refresh my SQL skills by creating some more or less complex queries. Unfortunately I could not find a solution for my last use case: "Get the sum of the five greatest orders for every category in year 1997." The tables involved are: Orders(OrderId, OrderDate) Order Details(OrderId, ProductId, Quantity, UnitPrice) Products(ProductId, CategoryId) Categories(CategoryId, CategoryName) I have tried the following query SELECT c.CategoryName, SUM( (SELECT TOP 5

Performance difference: select top 1 order by vs. select min(val)

自作多情 提交于 2019-12-04 07:30:51
Question is simple. Which query will be faster: SELECT TOP 1 value FROM table ORDER BY value or SELECT TOP 1 MIN(value) FROM table We can assume that we have two cases, Case 1. No index and Case 2. With index on value. Any insights are appreciated. Thanks! In the case where no index exists: MIN(value) should be implemented in O(N) time with a single scan; TOP 1 ... ORDER BY will require O(N Log N) time because of the specified sort (unless the DB engine is smart enough to read intent, which I would hate to rely on in production code). When an index does exist: Both should require only O(1)

Custom aggregation on PySpark dataframes

て烟熏妆下的殇ゞ 提交于 2019-12-04 07:22:26
I have a PySpark DataFrame with one column as one hot encoded vectors. I want to aggregate the different one hot encoded vectors by vector addition after groupby e.g. df[userid,action] Row1: ["1234","[1,0,0]] Row2: ["1234", [0 1 0]] I want the output as row: ["1234", [ 1 1 0]] so the vector is a sum of all vectors grouped by userid . How can I achieve this? PySpark sum aggregate operation does not support the vector addition. Assaf Mendelson You have several options: Create a user defined aggregate function. The problem is that you will need to write the user defined aggregate function in

PostgreSQL aggregate or window function to return just the last value

放肆的年华 提交于 2019-12-04 06:20:51
I'm using an aggregate function with the OVER clause in PostgreSQL 9.1 and I want to return just the last row for each window. The last_value() window function sounds like it might do what I want - but it doesn't. It returns a row for each row in the window, whereas I want just one row per window A simplified example: SELECT a, some_func_like_last_value(b) OVER (PARTITION BY a ORDER BY b) FROM ( SELECT 1 AS a, 'do not want this' AS b UNION SELECT 1, 'just want this' ) sub I want this to return one row: 1, 'just want this' Erwin Brandstetter DISTINCT plus window function Add a DISTINCT clause:

How to count 2 different data in one query

一笑奈何 提交于 2019-12-04 05:57:51
I need to calculate sum of occurences of some data in two columns in one query. DB is in SQL Server 2005. For example I have this table: Person: Id, Name, Age And I need to get in one query those results: 1. Count of Persons that have name 'John' 2. Count of 'John' with age more than 30 y. I can do that with subqueries in this way (it is only example): SELECT (SELECT COUNT(Id) FROM Persons WHERE Name = 'John'), (SELECT COUNT (Id) FROM Persons WHERE Name = 'John' AND age > 30) FROM Persons But this is very slow, and I'm searching for faster method. I found this solution for MySQL (it almost

Create array in SELECT

爱⌒轻易说出口 提交于 2019-12-04 05:39:52
I'm using PostgreSQL 9.1 and I have this data structure: A B ------- 1 a 1 a 1 b 1 c 1 c 1 c 1 d 2 e 2 e I need a query that produces this result: 1 4 {{c,3},{a,2},{b,1},{d,1}} 2 1 {{e,2}} A=1, 4 rows total with A=1, the partial counts (3 rows with c value, 2 rows with a value, .....) The distinct values of column "A" The count of all rows related to the "A" value An array contains all the elements related to the "A" value and the relative count of itself The sort needed for the array is based of the count of each group (like the example 3,2,1,1). This should do the trick: SELECT a , sum(ab_ct

Select a dynamic set of columns from a table and get the sum for each

十年热恋 提交于 2019-12-04 04:38:23
问题 Is it possible to do the following in Postgres: SELECT column_name FROM information_schema WHERE table_name = 'somereport' AND data_type = 'integer'; SELECT SUM(coulmn_name[0]),SUM(coulmn_name[1]) ,SUM(coulmn_name[3]) FROM somereport; In other words I need to select a group of columns from a table depending on certain criteria, and then sum each of those columns in the table. I know I can do this in a loop, so I can count each column independently, but obviously that requires a query for each

Fill in missing rows when aggregating over multiple fields in Postgres

戏子无情 提交于 2019-12-04 03:57:04
问题 I am aggregating sales for a set of products per day using Postgres and need to know not just when sales do happen, but also when they do not for further processing. SELECT sd.date, COUNT(sd.sale_id) AS sales, sd.product FROM sales_data sd -- sales per product, per day GROUP BY sd.product, sd.date ORDER BY sd.product, sd.date This produces the following: date | sales | product ------------+-------+------------------- 2017-08-17 | 10 | soap 2017-08-19 | 2 | soap 2017-08-20 | 5 | soap 2017-08

JPA JPQL: SELECT NEW with COUNT, GROUP BY and ORDER BY

徘徊边缘 提交于 2019-12-04 03:31:45
问题 There are 2 tables / entities: AppleTree and Apples . An apple tree produces 0...n apples. Each apple is an entity / row of the second table and references the apple tree that has produced it ( ManyToOne ). I want to generate a "high score" report on the most productive apple trees. It should be ordered by the COUNT column in descending order: APPLE TREE | COUNT(A) --------------------- 10304 | 1000 72020 | 952 31167 | 800 In order to handle these results, I created a non-entity bean that

MySQL aggregate function problem

假装没事ソ 提交于 2019-12-04 03:13:47
In the following example, why does the min() query return results, but the max() query does not? mysql> create table t(id int, a int); Query OK, 0 rows affected (0.10 sec) mysql> insert into t(id, a) values(1, 1); Query OK, 1 row affected (0.03 sec) mysql> insert into t(id, a) values(1, 2); Query OK, 1 row affected (0.02 sec) mysql> select * from t -> ; +------+------+ | id | a | +------+------+ | 1 | 1 | | 1 | 2 | +------+------+ 2 rows in set (0.00 sec) mysql> select * from t where a < 4; +------+------+ | id | a | +------+------+ | 1 | 1 | | 1 | 2 | +------+------+ 2 rows in set (0.00 sec)