aggregate-functions

How to find mean of grouped Vector columns in Spark SQL?

为君一笑 提交于 2019-12-17 06:51:45
问题 I have created a RelationalGroupedDataset by calling instances.groupBy(instances.col("property_name")) : val x = instances.groupBy(instances.col("property_name")) How do I compose a user-defined aggregate function to perform Statistics.colStats().mean on each group? Thanks! 回答1: Spark >= 2.4 You can use Summarizer : import org.apache.spark.ml.stat.Summarizer val dfNew = df.as[(Int, org.apache.spark.mllib.linalg.Vector)] .map { case (group, v) => (group, v.asML) } .toDF("group", "features")

Grouped string aggregation / LISTAGG for SQL Server

末鹿安然 提交于 2019-12-17 06:43:47
问题 I'm sure this has been asked but I can't quite find the right search terms. Given a schema like this: | CarMakeID | CarMake ------------------------ | 1 | SuperCars | 2 | MehCars | CarMakeID | CarModelID | CarModel ----------------------------------------- | 1 | 1 | Zoom | 2 | 1 | Wow | 3 | 1 | Awesome | 4 | 2 | Mediocrity | 5 | 2 | YoureSettling I want to produce a dataset like this: | CarMakeID | CarMake | CarModels --------------------------------------------- | 1 | SuperCars | Zoom, Wow,

The SQL OVER() clause - when and why is it useful?

▼魔方 西西 提交于 2019-12-17 06:22:13
问题 USE AdventureWorks2008R2; GO SELECT SalesOrderID, ProductID, OrderQty ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total' ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg' ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count' ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min' ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max' FROM Sales.SalesOrderDetail WHERE SalesOrderID IN(43659,43664); I read about that clause and I don't understand why I need it. What does the

How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?

99封情书 提交于 2019-12-17 04:18:24
问题 I am trying to fetch the first and the last record of a 'grouped' record. More precisely, I am doing a query like this SELECT MIN(low_price), MAX(high_price), open, close FROM symbols WHERE date BETWEEN(.. ..) GROUP BY YEARWEEK(date) but I'd like to get the first and the last record of the group. It could by done by doing tons of requests but I have a quite large table. Is there a (low processing time if possible) way to do this with MySQL? 回答1: You want to use GROUP_CONCAT and SUBSTRING

How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?

五迷三道 提交于 2019-12-17 04:17:25
问题 I am trying to fetch the first and the last record of a 'grouped' record. More precisely, I am doing a query like this SELECT MIN(low_price), MAX(high_price), open, close FROM symbols WHERE date BETWEEN(.. ..) GROUP BY YEARWEEK(date) but I'd like to get the first and the last record of the group. It could by done by doing tons of requests but I have a quite large table. Is there a (low processing time if possible) way to do this with MySQL? 回答1: You want to use GROUP_CONCAT and SUBSTRING

How to include “zero” / “0” results in COUNT aggregate?

主宰稳场 提交于 2019-12-17 03:34:25
问题 I've just got myself a little bit stuck with some SQL. I don't think I can phrase the question brilliantly - so let me show you. I have two tables, one called person, one called appointment. I'm trying to return the number of appointments a person has (including if they have zero). Appointment contains the person_id and there is a person_id per appointment. So COUNT(person_id) is a sensible approach. The query: SELECT person_id, COUNT(person_id) AS "number_of_appointments" FROM appointment

How to include “zero” / “0” results in COUNT aggregate?

﹥>﹥吖頭↗ 提交于 2019-12-17 03:33:00
问题 I've just got myself a little bit stuck with some SQL. I don't think I can phrase the question brilliantly - so let me show you. I have two tables, one called person, one called appointment. I'm trying to return the number of appointments a person has (including if they have zero). Appointment contains the person_id and there is a person_id per appointment. So COUNT(person_id) is a sensible approach. The query: SELECT person_id, COUNT(person_id) AS "number_of_appointments" FROM appointment

Optimal way to concatenate/aggregate strings

做~自己de王妃 提交于 2019-12-16 20:05:28
问题 I'm finding a way to aggregate strings from different rows into a single row. I'm looking to do this in many different places, so having a function to facilitate this would be nice. I've tried solutions using COALESCE and FOR XML , but they just don't cut it for me. String aggregation would do something like this: id | Name Result: id | Names -- - ---- -- - ----- 1 | Matt 1 | Matt, Rocks 1 | Rocks 2 | Stylus 2 | Stylus I've taken a look at CLR-defined aggregate functions as a replacement for

Select multiple row values into single row with multi-table clauses

旧巷老猫 提交于 2019-12-14 03:59:51
问题 I've searched the forums and while I see similar posts, they only address pieces of the full query I need to formulate (array_aggr, where exists, joins, etc.). If the question I'm posting has been answered, I will gladly accept references to those threads. I did find this thread ...which is very similar to what I need, except it is for MySQL, and I kept running into errors trying to get it into psql syntax. Hoping someone can help me get everything together. Here's the scenario: Attribute

How to use user variable as counter with inner join queries that contains GROUP BY statement?

孤人 提交于 2019-12-14 03:29:57
问题 I have 2 tables odds and matches : matches : has match_id and match_date odds : has id , timestamp , result , odd_value , user_id , match_id I had a query that get the following information from those tables for each user: winnings : the winning bets for each user. (when odds.result = 1) loses : the lost bets for each user.(when odds.result != 1) points : the points of each user.(the sum of the odds.odd_value) for each user. bonus : for each continuous 5 winnings i want to add extra bonus to