aggregate-functions | 易学教程

How can I use SUM for bit columns?

阅读更多关于 How can I use SUM for bit columns?

问题 How can use the function SUM() for bit columns in T-SQL? When I try do it as below: SELECT SUM(bitColumn) FROM MyTable; I get the error: Operand data type bit is invalid for sum operator. 回答1: SELECT SUM(CAST(bitColumn AS INT)) FROM dbo.MyTable need to cast into number or another solution - SELECT COUNT(*) FROM dbo.MyTable WHERE bitColumn = 1 回答2: You could consider 0 as nulls and simply count the remaining values: SELECT count(nullif(bitColumn, 0)) FROM MyTable; 回答3: You can achieve by using

PySpark Numeric Window Group By

阅读更多关于 PySpark Numeric Window Group By

I'd like to be able to have Spark group by a step size, as opposed to just single values. Is there anything in spark similar to PySpark 2.x's window function for numeric (non-date) values? Something along the lines of: sqlContext = SQLContext(sc) df = sqlContext.createDataFrame([10, 11, 12, 13], "integer").toDF("foo") res = df.groupBy(window("foo", step=2, start=10)).count() hi-zir You can reuse timestamp one and express parameters in seconds. Tumbling: from pyspark.sql.functions import col, window df.withColumn( "window", window( col("foo").cast("timestamp"), windowDuration="2 seconds" ).cast

Aggregate hstore column in PostreSQL

阅读更多关于 Aggregate hstore column in PostreSQL

Do aggregate MySQL functions always return a single row?

阅读更多关于 Do aggregate MySQL functions always return a single row?

I'm sorry if this is really basic, but: I feel at some point I didn't have this issue, and now I am, so either I was doing something totally different before or my syntax has skipped a step. I have, for example, a query that I need to return all rows with certain data along with another column that has the total of one of those columns. If things worked as I expected them, it would look like: SELECT order_id, cost, part_id, SUM(cost) AS total FROM orders WHERE order_date BETWEEN xxx AND yyy And I would get all the rows with my orders, with the total tacked on to the end of each one. I know the

Spark custom aggregation : collect_list+UDF vs UDAF

阅读更多关于 Spark custom aggregation : collect_list+UDF vs UDAF

问题 I often have the need to perform custom aggregations on dataframes in spark 2.1, and used these two approaches : using groupby/collect_list to get all the values in a single row, then apply an UDF to aggregate the values Writing a custom UDAF (User defined aggregate function) I generally prefer the first option as its easier to implement and more readable than the UDAF implementation. But I would assume that the first option is generally slower, because more data is sent around the network

Annotate (group) dates by month/year in Django

阅读更多关于 Annotate (group) dates by month/year in Django

问题 Using the Django DateQuerySet I'm pulling related years for item objects from a Group query. >>> Group.objects.all().dates('item__date', 'year') [datetime.date(1990, 1, 1), datetime.date(1991, 1, 1), ...(remaining elements truncated)...'] Now I want to perform a count by distinct year on these dates. I thought this would work: >>> Group.objects.all().dates('item__date', 'year').annotate(Count('year')) FieldError: Cannot resolve keyword 'year' into field. But looks like I'm missing something.

MySQL: What happens to non-aggregated fields upon a GROUP BY?

阅读更多关于 MySQL: What happens to non-aggregated fields upon a GROUP BY?

I have very a basic question about the following behavior in MySQL. Suppose we do the following GROUP BY : SELECT a, b, SUM(c) FROM table GROUP BY b; What happens to the field a , which is neither aggregated nor is it included in the GROUP BY fields? Does MySQL just implicitly apply FIRST(a) to a ? If so, is this behavior consistent or does it grab a random value out of all values for a ? It's the first result value the query processor gets back from the storage medium, dependant on the chosen query strategy. Technically this is undefined, but your table has no indicies other than it's key,

MySQL Group By functionality in different version

阅读更多关于 MySQL Group By functionality in different version

Following is a simple SQL query: SELECT * FROM *table_name* GROUP BY *column_name* In my system I have MySQL 5.5. It is working absolutely fine. Whereas in my friend's system he have MySQL 5.7, and he is getting the following error: ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'testdb.assetentry.entryId' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by It is clearly visible this is happening because the versions are different. But what I want to know is the

Aggregate String Concatenation in Oracle 10g [duplicate]

阅读更多关于 Aggregate String Concatenation in Oracle 10g [duplicate]

This question already has an answer here: Concatenate results from a SQL query in Oracle 7 answers I'm using Oracle 10 g, I have a scenario similar to this: No Name -- ----- 1 Rony 1 James 1 Aby 2 Sam 2 Willy 3 Mike I need to aggregate and concatenate the strings (with a single space in between), in a way to get the results: No Name -- ----- 1 Rony James Aby 2 Sam Willy 3 Mike I'm using Oracle 10g and have to implement this using SQL and not PL/SQL. Is there a way out? It is easy on 11G, you can use the LISTAGG function, but sadly not on 10G There are some techniques here for earlier versions

User defined function to be applied to Window in PySpark?

阅读更多关于 User defined function to be applied to Window in PySpark?

I am trying to apply a user defined function to Window in PySpark. I have read that UDAF might be the way to to go, but I was not able to find anything concrete. To give an example (taken from here: Xinh's Tech Blog and modified for PySpark): from pyspark import SparkConf from pyspark.sql import SparkSession from pyspark.sql.window import Window from pyspark.sql.functions import avg spark = SparkSession.builder.master("local").config(conf=SparkConf()).getOrCreate() a = spark.createDataFrame([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]], ['ind', "state"]) customers = spark.createDataFrame(