aggregate-functions

How can I use SUM for bit columns?

淺唱寂寞╮ 提交于 2019-11-29 16:08:47
问题 How can use the function SUM() for bit columns in T-SQL? When I try do it as below: SELECT SUM(bitColumn) FROM MyTable; I get the error: Operand data type bit is invalid for sum operator. 回答1: SELECT SUM(CAST(bitColumn AS INT)) FROM dbo.MyTable need to cast into number or another solution - SELECT COUNT(*) FROM dbo.MyTable WHERE bitColumn = 1 回答2: You could consider 0 as nulls and simply count the remaining values: SELECT count(nullif(bitColumn, 0)) FROM MyTable; 回答3: You can achieve by using

PySpark Numeric Window Group By

末鹿安然 提交于 2019-11-29 15:19:43
I'd like to be able to have Spark group by a step size, as opposed to just single values. Is there anything in spark similar to PySpark 2.x's window function for numeric (non-date) values? Something along the lines of: sqlContext = SQLContext(sc) df = sqlContext.createDataFrame([10, 11, 12, 13], "integer").toDF("foo") res = df.groupBy(window("foo", step=2, start=10)).count() hi-zir You can reuse timestamp one and express parameters in seconds. Tumbling: from pyspark.sql.functions import col, window df.withColumn( "window", window( col("foo").cast("timestamp"), windowDuration="2 seconds" ).cast

Aggregate hstore column in PostreSQL

跟風遠走 提交于 2019-11-29 14:37:14
问题 I have a table like this: Table "public.statistics" id | integer | not null default nextval('statistics_id_seq'::regclass) goals | hstore | items: |id |goals | |30059 |"3"=>"123" | |27333 |"3"=>"200", "5"=>"10" | What I need to do for aggregate all values by key in hash? I want to get result like this: select sum(goals) from statistics return |goals | |"3"=>"323", "5"=>"10" | 回答1: Building on Laurence's answer, here's a pure SQL way to aggregate the summed key/value pairs into a new hstore

Do aggregate MySQL functions always return a single row?

强颜欢笑 提交于 2019-11-29 14:26:08
I'm sorry if this is really basic, but: I feel at some point I didn't have this issue, and now I am, so either I was doing something totally different before or my syntax has skipped a step. I have, for example, a query that I need to return all rows with certain data along with another column that has the total of one of those columns. If things worked as I expected them, it would look like: SELECT order_id, cost, part_id, SUM(cost) AS total FROM orders WHERE order_date BETWEEN xxx AND yyy And I would get all the rows with my orders, with the total tacked on to the end of each one. I know the

Spark custom aggregation : collect_list+UDF vs UDAF

拟墨画扇 提交于 2019-11-29 14:03:30
问题 I often have the need to perform custom aggregations on dataframes in spark 2.1, and used these two approaches : using groupby/collect_list to get all the values in a single row, then apply an UDF to aggregate the values Writing a custom UDAF (User defined aggregate function) I generally prefer the first option as its easier to implement and more readable than the UDAF implementation. But I would assume that the first option is generally slower, because more data is sent around the network

Annotate (group) dates by month/year in Django

走远了吗. 提交于 2019-11-29 12:49:32
问题 Using the Django DateQuerySet I'm pulling related years for item objects from a Group query. >>> Group.objects.all().dates('item__date', 'year') [datetime.date(1990, 1, 1), datetime.date(1991, 1, 1), ...(remaining elements truncated)...'] Now I want to perform a count by distinct year on these dates. I thought this would work: >>> Group.objects.all().dates('item__date', 'year').annotate(Count('year')) FieldError: Cannot resolve keyword 'year' into field. But looks like I'm missing something.

MySQL: What happens to non-aggregated fields upon a GROUP BY?

此生再无相见时 提交于 2019-11-29 11:40:55
I have very a basic question about the following behavior in MySQL. Suppose we do the following GROUP BY : SELECT a, b, SUM(c) FROM table GROUP BY b; What happens to the field a , which is neither aggregated nor is it included in the GROUP BY fields? Does MySQL just implicitly apply FIRST(a) to a ? If so, is this behavior consistent or does it grab a random value out of all values for a ? It's the first result value the query processor gets back from the storage medium, dependant on the chosen query strategy. Technically this is undefined, but your table has no indicies other than it's key,

MySQL Group By functionality in different version

十年热恋 提交于 2019-11-29 11:39:55
Following is a simple SQL query: SELECT * FROM *table_name* GROUP BY *column_name* In my system I have MySQL 5.5. It is working absolutely fine. Whereas in my friend's system he have MySQL 5.7, and he is getting the following error: ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'testdb.assetentry.entryId' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by It is clearly visible this is happening because the versions are different. But what I want to know is the

Aggregate String Concatenation in Oracle 10g [duplicate]

我们两清 提交于 2019-11-29 11:29:39
This question already has an answer here: Concatenate results from a SQL query in Oracle 7 answers I'm using Oracle 10 g, I have a scenario similar to this: No Name -- ----- 1 Rony 1 James 1 Aby 2 Sam 2 Willy 3 Mike I need to aggregate and concatenate the strings (with a single space in between), in a way to get the results: No Name -- ----- 1 Rony James Aby 2 Sam Willy 3 Mike I'm using Oracle 10g and have to implement this using SQL and not PL/SQL. Is there a way out? It is easy on 11G, you can use the LISTAGG function, but sadly not on 10G There are some techniques here for earlier versions

User defined function to be applied to Window in PySpark?

夙愿已清 提交于 2019-11-29 11:17:44
I am trying to apply a user defined function to Window in PySpark. I have read that UDAF might be the way to to go, but I was not able to find anything concrete. To give an example (taken from here: Xinh's Tech Blog and modified for PySpark): from pyspark import SparkConf from pyspark.sql import SparkSession from pyspark.sql.window import Window from pyspark.sql.functions import avg spark = SparkSession.builder.master("local").config(conf=SparkConf()).getOrCreate() a = spark.createDataFrame([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]], ['ind', "state"]) customers = spark.createDataFrame(