aggregate-functions

Postgres FROM query with one of the column name

萝らか妹 提交于 2019-12-24 05:57:29
问题 As follow-up to the previous question: Count matches between multiple columns and words in a nested array I have the following query: SELECT row_number() OVER (ORDER BY t.id) AS id , t.id AS "RID" , count(DISTINCT a.ord) AS "Matches" FROM tbl t LEFT JOIN ( unnest(array_content) WITH ORDINALITY x(elem, ord) CROSS JOIN LATERAL unnest(string_to_array(elem, ',')) txt ) a ON t.description ~ a.txt OR t.additional_info ~ a.txt GROUP BY t.id; which gives me matches correctly, but now the value for

Postgres FROM query with one of the column name

≡放荡痞女 提交于 2019-12-24 05:57:06
问题 As follow-up to the previous question: Count matches between multiple columns and words in a nested array I have the following query: SELECT row_number() OVER (ORDER BY t.id) AS id , t.id AS "RID" , count(DISTINCT a.ord) AS "Matches" FROM tbl t LEFT JOIN ( unnest(array_content) WITH ORDINALITY x(elem, ord) CROSS JOIN LATERAL unnest(string_to_array(elem, ',')) txt ) a ON t.description ~ a.txt OR t.additional_info ~ a.txt GROUP BY t.id; which gives me matches correctly, but now the value for

Query to find all timestamps more than a certain interval apart

妖精的绣舞 提交于 2019-12-24 04:43:07
问题 I'm using postgres to run some analytics on user activity. I have a table of all requests(pageviews) made by every user and the timestamp of the request, and I'm trying to find the number of distinct sessions for every user. For the sake of simplicity, I'm considering every set of requests an hour or more apart from others as a distinct session. The data looks something like this: id| request_time| user_id 1 2014-01-12 08:57:16.725533 1233 2 2014-01-12 08:57:20.944193 1234 3 2014-01-12 09:15

MySQL: Update rows in table by iterating and joining with another one

我们两清 提交于 2019-12-24 02:49:07
问题 I have a table papers CREATE TABLE `papers` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(1000) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL, `my_count` int(11) NOT NULL, PRIMARY KEY (`id`), FULLTEXT KEY `title_fulltext` (`title`), ) ENGINE=MyISAM AUTO_INCREMENT=1617432 DEFAULT CHARSET=utf8 COLLATE=utf8_bin and another table link_table CREATE TABLE `auth2paper2loc` ( `auth_id` int(11) NOT NULL, `paper_id` int(11) NOT NULL, `loc_id` int(11) DEFAULT NULL ) ENGINE=MyISAM

Cannot be used in the PIVOT operator because it is not invariant to NULLs

五迷三道 提交于 2019-12-24 02:07:41
问题 I create an aggregate function for string column in SQL Server 2008. C# code look like this: using System; using System.Collections.Generic; using System.Data.SqlTypes; using System.IO; using Microsoft.SqlServer.Server; [Serializable] [SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize = 8000)] public struct strconcat : IBinarySerialize { private List<String> values; public void Init() { this.values = new List<String>(); } public void Accumulate(SqlString value = new SqlString()) { this

SQL get ROW_NUMBER and COUNT on every SELECT request

混江龙づ霸主 提交于 2019-12-23 22:51:53
问题 I´m building a grid mechanism where I need to retrieve data from Database the total or records found, retrieving just a range of these records with a row_number in it. I´m using SqlServer for testing, but I need to support that on Oracle and MySql as well. That´s what I´m trying, but I can´t make it work: SELECT * FROM (SELECT ROW_NUMBER() AS RN, COUNT(*) AS TOTALCN, Id, Name, Phone FROM MyTable WHERE Deleted='F') WHERE RN > 100 AND RN < 150; The idea is: MyTable -> number of records: 1000

How to compute the largest value in a column using withColumn?

随声附和 提交于 2019-12-23 17:19:49
问题 I'm trying to compute the largest value of the following DataFrame in Spark 1.6.1 : val df = sc.parallelize(Seq(1,2,3)).toDF("id") A first approach would be to select the maximum value, and it works as expected: df.select(max($"id")).show The second approach could be to use withColumn as follows: df.withColumn("max", max($"id")).show But unfortunately it fails with the following error message: org.apache.spark.sql.AnalysisException: expression 'id' is neither present in the group by, nor is

R - Count numbers of certain values in each column

北城以北 提交于 2019-12-23 17:14:40
问题 I have found similar questions to mine, but none of them explains how to do that for each column of a dataframe. I have a dataframe like this: x1 = seq(12, 200, length=20) x2 = seq(50, 120, length=20) x3 = seq(40, 250, length=20) x4 = seq(100,130, length=20) x5 = seq(10, 300, length=20) df = data.frame(V1=x1, V2=x2, V3=x3, V4=x4, V5=x5) Now I want to get the number of values that are greater than 120 for each column. I have tried: nrow(df[,1] >120) That didnt work, it says 0, but its not true

Correlated query: select where condition not max(condition in inner query)

夙愿已清 提交于 2019-12-23 16:09:05
问题 I am trying to select all the rows where the userName and groupId is duplicated, and the userId is not the max userId for that userName/groupId combination. Here is my code so far: select * from userTable u where exists (select * from userTable u1 where userName <> '' and userName is not null and u.userName = u1.userName and u.groupId = u1.groupId and u.userId <> max(u1.userId) group by userName, groupId having count(*) > 1) order by userName However, the line: and u.userId <> u1.max(userId)

Group by X or Y?

元气小坏坏 提交于 2019-12-23 12:48:26
问题 I'm trying to figure out how to GROUP BY on multiple columns. I want to group items when the SSN or the address matches. For example, here are three records: account_number | name | ssn | address ---------------+--------------+-------------+---------------------- 23952352340 | SMITH INC | 123-45-6789 | P.O. BOX 123 3459450340 | JOHN SMITH | 123-45-6789 | 123 EVERGREEN TERRACE 45949459494 | JANE SMITH | 395-23-1924 | 123 EVERGREEN TERRACE And here's what I'd like to end up with: names --------