aggregate-functions | 易学教程

Postgres FROM query with one of the column name

阅读更多关于 Postgres FROM query with one of the column name

问题 As follow-up to the previous question: Count matches between multiple columns and words in a nested array I have the following query: SELECT row_number() OVER (ORDER BY t.id) AS id , t.id AS "RID" , count(DISTINCT a.ord) AS "Matches" FROM tbl t LEFT JOIN ( unnest(array_content) WITH ORDINALITY x(elem, ord) CROSS JOIN LATERAL unnest(string_to_array(elem, ',')) txt ) a ON t.description ~ a.txt OR t.additional_info ~ a.txt GROUP BY t.id; which gives me matches correctly, but now the value for

Postgres FROM query with one of the column name

阅读更多关于 Postgres FROM query with one of the column name

Query to find all timestamps more than a certain interval apart

阅读更多关于 Query to find all timestamps more than a certain interval apart

问题 I'm using postgres to run some analytics on user activity. I have a table of all requests(pageviews) made by every user and the timestamp of the request, and I'm trying to find the number of distinct sessions for every user. For the sake of simplicity, I'm considering every set of requests an hour or more apart from others as a distinct session. The data looks something like this: id| request_time| user_id 1 2014-01-12 08:57:16.725533 1233 2 2014-01-12 08:57:20.944193 1234 3 2014-01-12 09:15

MySQL: Update rows in table by iterating and joining with another one

阅读更多关于 MySQL: Update rows in table by iterating and joining with another one

问题 I have a table papers CREATE TABLE `papers` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(1000) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL, `my_count` int(11) NOT NULL, PRIMARY KEY (`id`), FULLTEXT KEY `title_fulltext` (`title`), ) ENGINE=MyISAM AUTO_INCREMENT=1617432 DEFAULT CHARSET=utf8 COLLATE=utf8_bin and another table link_table CREATE TABLE `auth2paper2loc` ( `auth_id` int(11) NOT NULL, `paper_id` int(11) NOT NULL, `loc_id` int(11) DEFAULT NULL ) ENGINE=MyISAM

Cannot be used in the PIVOT operator because it is not invariant to NULLs

阅读更多关于 Cannot be used in the PIVOT operator because it is not invariant to NULLs

问题 I create an aggregate function for string column in SQL Server 2008. C# code look like this: using System; using System.Collections.Generic; using System.Data.SqlTypes; using System.IO; using Microsoft.SqlServer.Server; [Serializable] [SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize = 8000)] public struct strconcat : IBinarySerialize { private List<String> values; public void Init() { this.values = new List<String>(); } public void Accumulate(SqlString value = new SqlString()) { this

SQL get ROW_NUMBER and COUNT on every SELECT request

阅读更多关于 SQL get ROW_NUMBER and COUNT on every SELECT request

问题 I´m building a grid mechanism where I need to retrieve data from Database the total or records found, retrieving just a range of these records with a row_number in it. I´m using SqlServer for testing, but I need to support that on Oracle and MySql as well. That´s what I´m trying, but I can´t make it work: SELECT * FROM (SELECT ROW_NUMBER() AS RN, COUNT(*) AS TOTALCN, Id, Name, Phone FROM MyTable WHERE Deleted='F') WHERE RN > 100 AND RN < 150; The idea is: MyTable -> number of records: 1000

How to compute the largest value in a column using withColumn?

阅读更多关于 How to compute the largest value in a column using withColumn?

问题 I'm trying to compute the largest value of the following DataFrame in Spark 1.6.1 : val df = sc.parallelize(Seq(1,2,3)).toDF("id") A first approach would be to select the maximum value, and it works as expected: df.select(max($"id")).show The second approach could be to use withColumn as follows: df.withColumn("max", max($"id")).show But unfortunately it fails with the following error message: org.apache.spark.sql.AnalysisException: expression 'id' is neither present in the group by, nor is

R - Count numbers of certain values in each column

阅读更多关于 R - Count numbers of certain values in each column

问题 I have found similar questions to mine, but none of them explains how to do that for each column of a dataframe. I have a dataframe like this: x1 = seq(12, 200, length=20) x2 = seq(50, 120, length=20) x3 = seq(40, 250, length=20) x4 = seq(100,130, length=20) x5 = seq(10, 300, length=20) df = data.frame(V1=x1, V2=x2, V3=x3, V4=x4, V5=x5) Now I want to get the number of values that are greater than 120 for each column. I have tried: nrow(df[,1] >120) That didnt work, it says 0, but its not true

Correlated query: select where condition not max(condition in inner query)

阅读更多关于 Correlated query: select where condition not max(condition in inner query)

问题 I am trying to select all the rows where the userName and groupId is duplicated, and the userId is not the max userId for that userName/groupId combination. Here is my code so far: select * from userTable u where exists (select * from userTable u1 where userName <> '' and userName is not null and u.userName = u1.userName and u.groupId = u1.groupId and u.userId <> max(u1.userId) group by userName, groupId having count(*) > 1) order by userName However, the line: and u.userId <> u1.max(userId)

Group by X or Y?

阅读更多关于 Group by X or Y?