aggregate-functions | 易学教程

SQL - Subquery in Aggregate Function

阅读更多关于 SQL - Subquery in Aggregate Function

问题 I'm using the northwind database to refresh my SQL skills by creating some more or less complex queries. Unfortunately I could not find a solution for my last use case: "Get the sum of the five greatest orders for every category in year 1997." The tables involved are: Orders(OrderId, OrderDate) Order Details(OrderId, ProductId, Quantity, UnitPrice) Products(ProductId, CategoryId) Categories(CategoryId, CategoryName) I have tried the following query SELECT c.CategoryName, SUM( (SELECT TOP 5

Performance difference: select top 1 order by vs. select min(val)

阅读更多关于 Performance difference: select top 1 order by vs. select min(val)

Question is simple. Which query will be faster: SELECT TOP 1 value FROM table ORDER BY value or SELECT TOP 1 MIN(value) FROM table We can assume that we have two cases, Case 1. No index and Case 2. With index on value. Any insights are appreciated. Thanks! In the case where no index exists: MIN(value) should be implemented in O(N) time with a single scan; TOP 1 ... ORDER BY will require O(N Log N) time because of the specified sort (unless the DB engine is smart enough to read intent, which I would hate to rely on in production code). When an index does exist: Both should require only O(1)

Custom aggregation on PySpark dataframes

阅读更多关于 Custom aggregation on PySpark dataframes

I have a PySpark DataFrame with one column as one hot encoded vectors. I want to aggregate the different one hot encoded vectors by vector addition after groupby e.g. df[userid,action] Row1: ["1234","[1,0,0]] Row2: ["1234", [0 1 0]] I want the output as row: ["1234", [ 1 1 0]] so the vector is a sum of all vectors grouped by userid . How can I achieve this? PySpark sum aggregate operation does not support the vector addition. Assaf Mendelson You have several options: Create a user defined aggregate function. The problem is that you will need to write the user defined aggregate function in

PostgreSQL aggregate or window function to return just the last value

阅读更多关于 PostgreSQL aggregate or window function to return just the last value

I'm using an aggregate function with the OVER clause in PostgreSQL 9.1 and I want to return just the last row for each window. The last_value() window function sounds like it might do what I want - but it doesn't. It returns a row for each row in the window, whereas I want just one row per window A simplified example: SELECT a, some_func_like_last_value(b) OVER (PARTITION BY a ORDER BY b) FROM ( SELECT 1 AS a, 'do not want this' AS b UNION SELECT 1, 'just want this' ) sub I want this to return one row: 1, 'just want this' Erwin Brandstetter DISTINCT plus window function Add a DISTINCT clause:

How to count 2 different data in one query

阅读更多关于 How to count 2 different data in one query

I need to calculate sum of occurences of some data in two columns in one query. DB is in SQL Server 2005. For example I have this table: Person: Id, Name, Age And I need to get in one query those results: 1. Count of Persons that have name 'John' 2. Count of 'John' with age more than 30 y. I can do that with subqueries in this way (it is only example): SELECT (SELECT COUNT(Id) FROM Persons WHERE Name = 'John'), (SELECT COUNT (Id) FROM Persons WHERE Name = 'John' AND age > 30) FROM Persons But this is very slow, and I'm searching for faster method. I found this solution for MySQL (it almost

Create array in SELECT

阅读更多关于 Create array in SELECT

I'm using PostgreSQL 9.1 and I have this data structure: A B ------- 1 a 1 a 1 b 1 c 1 c 1 c 1 d 2 e 2 e I need a query that produces this result: 1 4 {{c,3},{a,2},{b,1},{d,1}} 2 1 {{e,2}} A=1, 4 rows total with A=1, the partial counts (3 rows with c value, 2 rows with a value, .....) The distinct values of column "A" The count of all rows related to the "A" value An array contains all the elements related to the "A" value and the relative count of itself The sort needed for the array is based of the count of each group (like the example 3,2,1,1). This should do the trick: SELECT a , sum(ab_ct

Select a dynamic set of columns from a table and get the sum for each

阅读更多关于 Select a dynamic set of columns from a table and get the sum for each

问题 Is it possible to do the following in Postgres: SELECT column_name FROM information_schema WHERE table_name = 'somereport' AND data_type = 'integer'; SELECT SUM(coulmn_name[0]),SUM(coulmn_name[1]) ,SUM(coulmn_name[3]) FROM somereport; In other words I need to select a group of columns from a table depending on certain criteria, and then sum each of those columns in the table. I know I can do this in a loop, so I can count each column independently, but obviously that requires a query for each

Fill in missing rows when aggregating over multiple fields in Postgres

阅读更多关于 Fill in missing rows when aggregating over multiple fields in Postgres

问题 I am aggregating sales for a set of products per day using Postgres and need to know not just when sales do happen, but also when they do not for further processing. SELECT sd.date, COUNT(sd.sale_id) AS sales, sd.product FROM sales_data sd -- sales per product, per day GROUP BY sd.product, sd.date ORDER BY sd.product, sd.date This produces the following: date | sales | product ------------+-------+------------------- 2017-08-17 | 10 | soap 2017-08-19 | 2 | soap 2017-08-20 | 5 | soap 2017-08

JPA JPQL: SELECT NEW with COUNT, GROUP BY and ORDER BY

阅读更多关于 JPA JPQL: SELECT NEW with COUNT, GROUP BY and ORDER BY

问题 There are 2 tables / entities: AppleTree and Apples . An apple tree produces 0...n apples. Each apple is an entity / row of the second table and references the apple tree that has produced it ( ManyToOne ). I want to generate a "high score" report on the most productive apple trees. It should be ordered by the COUNT column in descending order: APPLE TREE | COUNT(A) --------------------- 10304 | 1000 72020 | 952 31167 | 800 In order to handle these results, I created a non-entity bean that

MySQL aggregate function problem

阅读更多关于 MySQL aggregate function problem

In the following example, why does the min() query return results, but the max() query does not? mysql> create table t(id int, a int); Query OK, 0 rows affected (0.10 sec) mysql> insert into t(id, a) values(1, 1); Query OK, 1 row affected (0.03 sec) mysql> insert into t(id, a) values(1, 2); Query OK, 1 row affected (0.02 sec) mysql> select * from t -> ; +------+------+ | id | a | +------+------+ | 1 | 1 | | 1 | 2 | +------+------+ 2 rows in set (0.00 sec) mysql> select * from t where a < 4; +------+------+ | id | a | +------+------+ | 1 | 1 | | 1 | 2 | +------+------+ 2 rows in set (0.00 sec)