group-by

Flatten/merge overlapping time intervals

别来无恙 提交于 2020-01-23 01:18:13
问题 I have a 'Service' table with millions of rows. Each row corresponds to a service provided by a staff in a given date and time interval (Each row has a unique ID). There are cases where a staff might provide services in overlapping time frames. I need to write a query that merges overlapping time intervals and returns the data in the format shown below. I tried grouping by StaffID and Date fields and getting the Min of BeginTime and Max of EndTime but that does not account for the non

Select most occurring value in MySQL

爷,独闯天下 提交于 2020-01-22 23:07:39
问题 I'm looking for a way to select the most occurring value, e.g. the person who posted most for each thread; SELECT MOST_OCCURRING(user_id) FROM thread_posts GROUP BY thread_id Is there a good way to do this? 回答1: If you want a count on a per thread basis, I think you can use a nested query; grouping by thread first and then by user: SELECT thread_id AS tid, (SELECT user_id FROM thread_posts WHERE thread_id = tid GROUP BY user_id ORDER BY COUNT(*) DESC LIMIT 0,1) AS topUser FROM thread_posts

Select most occurring value in MySQL

柔情痞子 提交于 2020-01-22 23:07:04
问题 I'm looking for a way to select the most occurring value, e.g. the person who posted most for each thread; SELECT MOST_OCCURRING(user_id) FROM thread_posts GROUP BY thread_id Is there a good way to do this? 回答1: If you want a count on a per thread basis, I think you can use a nested query; grouping by thread first and then by user: SELECT thread_id AS tid, (SELECT user_id FROM thread_posts WHERE thread_id = tid GROUP BY user_id ORDER BY COUNT(*) DESC LIMIT 0,1) AS topUser FROM thread_posts

How to group words whose Levenshtein distance is more than 80 percent in Python

雨燕双飞 提交于 2020-01-22 05:07:33
问题 Suppose I have a list:- person_name = ['zakesh', 'oldman LLC', 'bikash', 'goldman LLC', 'zikash','rakesh'] I am trying to group the list in such a way so the Levenshtein distance between two strings is maximum. For finding out the ratio between two words, I am using a python package fuzzywuzzy. Examples :- >>> from fuzzywuzzy import fuzz >>> combined_list = ['rakesh', 'zakesh', 'bikash', 'zikash', 'goldman LLC', 'oldman LLC'] >>> fuzz.ratio('goldman LLC', 'oldman LLC') 95 >>> fuzz.ratio(

INSERT a SELECT GROUP BY : more target columns than expressions error [duplicate]

眉间皱痕 提交于 2020-01-22 00:37:13
问题 This question already has an answer here : PostgreSQL, SQL state: 42601 (1 answer) Closed 4 months ago . I have a query, that I want to make, it is an INSERT FROM a SELECT GROUP BY, but I get the error: ERROR: INSERT has more target columns than expressions LINE 15: INSERT INTO "KPI_MEASURE" (id, created_at, kpi_project_id, k... _____________________________________^ HINT: The insertion source is a row expression containing the same number of columns expected by the INSERT. Did you

pandas groupby and rank within groups that start with 1 for each group

不想你离开。 提交于 2020-01-21 12:47:24
问题 I have a dataframe: import pandas as pd df = pd.DataFrame([[1, 'a'], [1, 'a'], [1, 'b'], [1, 'a'], [2, 'a'], [2, 'b'], [2, 'a'], [2, 'b'], [3, 'b'], [3, 'a'], [3, 'b'], ], columns=['session', 'issue']) df I would like to rank issues within sessions. I tried with: df.groupby(['session', 'issue']).size().rank(ascending=False, method='dense') session issue 1 a 1.0 b 3.0 2 a 2.0 b 2.0 3 a 3.0 b 2.0 dtype: float64 What I need is result like this one: for group session=1, there are three a issues

pandas GroupBy and cumulative mean of previous rows in group

℡╲_俬逩灬. 提交于 2020-01-21 11:51:46
问题 I have a dataframe which looks like this: pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4], 'order_start': [1,2,3,1,2,3,1,2,3,1], 'time': [1, 4, 3, 6, 8, 17, 14, 12, 13, 16]}) Out[40]: category order_start time 0 1 1 1 1 1 2 4 2 1 3 3 3 2 1 6 4 2 2 8 5 2 3 17 6 3 1 14 7 3 2 12 8 3 3 13 9 4 1 16 I would like to create a new column which contains the mean of the previous times of the same category. How can I create it ? The new column should look like this: pd.DataFrame({'category': [1,1,1,2,2,2

pandas GroupBy and cumulative mean of previous rows in group

ぐ巨炮叔叔 提交于 2020-01-21 11:51:24
问题 I have a dataframe which looks like this: pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4], 'order_start': [1,2,3,1,2,3,1,2,3,1], 'time': [1, 4, 3, 6, 8, 17, 14, 12, 13, 16]}) Out[40]: category order_start time 0 1 1 1 1 1 2 4 2 1 3 3 3 2 1 6 4 2 2 8 5 2 3 17 6 3 1 14 7 3 2 12 8 3 3 13 9 4 1 16 I would like to create a new column which contains the mean of the previous times of the same category. How can I create it ? The new column should look like this: pd.DataFrame({'category': [1,1,1,2,2,2

Grouping records and getting standard deviation intervals for grouped records in BigQuery, getting wrong value

拟墨画扇 提交于 2020-01-19 18:05:22
问题 I have a SQL below which is able to get the interval average of timestamp column grouped by icao_address, flight_number, flight_date. I'm trying to do the same for standard deviation and although I get a figure, it is wrong. The standard deviation that I get back is 14.06 (look at image below to see) while it should be around 1.8. Below is what I'm using for stddev calculation. STDDEV_POP(UNIX_SECONDS(timestamp))as standard_deviation Below is my SQL #standardSQL select DATE(timestamp) as

Grouping records and getting standard deviation intervals for grouped records in BigQuery, getting wrong value

一笑奈何 提交于 2020-01-19 18:05:09
问题 I have a SQL below which is able to get the interval average of timestamp column grouped by icao_address, flight_number, flight_date. I'm trying to do the same for standard deviation and although I get a figure, it is wrong. The standard deviation that I get back is 14.06 (look at image below to see) while it should be around 1.8. Below is what I'm using for stddev calculation. STDDEV_POP(UNIX_SECONDS(timestamp))as standard_deviation Below is my SQL #standardSQL select DATE(timestamp) as