group-by

Pandas: Group Data by column A, Filter A by existing values of column B

痴心易碎 提交于 2019-12-24 10:46:32
问题 I'm new to pandas and want to create a new dataset with grouped and filtered data. Right now, my dataset contains two columns looking like this (first column with A, B or C, second with value): A 1 A 2 A 3 A 4 B 1 B 2 B 3 C 4 --> now I want to group by the keys of the first column (A,B,C) , and show only the keys, where the values 1 AND 2 exist. So that my new data set looks like: A 1 A 2 B 1 B 2 Until now, I'm only able to print everything but I don't know how to filter: for name, group in

Hive - multiple (average) count distincts over layered groups

穿精又带淫゛_ 提交于 2019-12-24 10:45:59
问题 Given the following source data (say the table name is user_activity ): +---------+-----------+------------+ | user_id | user_type | some_date | +---------+-----------+------------+ | 1 | a | 2018-01-01 | | 1 | a | 2018-01-02 | | 2 | a | 2018-01-01 | | 3 | a | 2018-01-01 | | 4 | b | 2018-01-01 | | 4 | b | 2018-01-02 | | 5 | b | 2018-01-02 | +---------+-----------+------------+ I'd like to get the following result: +-----------+------------+---------------------+ | user_type | user_count |

MySQL SELECT SUM based on another table

纵然是瞬间 提交于 2019-12-24 10:38:19
问题 I have an table on MySQL tb_currencies currencyid | currency_name CU0001 | IDR CU0002 | SGD CU0003 | USD tb_currency_converters currencyconverterid | currency_month | from_currencyid_fk | to_currencyid_fk | amount CC0001 | 2018-03-01 | CU0001 | CU0002 | 0.00009 CC0002 | 2018-03-01 | CU0002 | CU0001 | 10425 CC0003 | 2018-03-01 | CU0003 | CU0002 | 1.31964 tb_budgets budgetid | budget_month | departmentid_fk | currencyid_fk BU201803000001 | 2018-03-01 | DP0003 | CU0002 BU201803000002 | 2018-03

SQL how to group in time periods

↘锁芯ラ 提交于 2019-12-24 09:48:04
问题 I'm trying to group data in time periods. Each period is 5 minutes and I'd like to see what was happening every 5 minutes from 08:00 to 18:00. I have created a table that has all the time periods in that time range. E.g.: StartTime EndTime IsBusinessHours 08:40:00.0000000 08:45:00.0000000 1 08:45:00.0000000 08:50:00.0000000 1 08:50:00.0000000 08:55:00.0000000 1 08:55:00.0000000 09:00:00.0000000 1 etc. Select TimeDimension.[StartTime], TimeDimension.[EndTime], activity.[Description], activity.

Creating slices of dataframe groupby groups

故事扮演 提交于 2019-12-24 09:22:46
问题 I have a Dataframe with 3 columns - location_id, customers, cluster. Previously, I clustered by data into 5 clusters. Hence, the cluster column contain values [0, 1, 2, 3, 4]. I would like to separate each cluster into 2 slices for my next stage of testing. E.g. 50-50 slice, or 30-70 slice, or 20-80 slice. Question - How do I apply a function that adds a column to data.groupby('cluster') ? Ideal Result location_id customers cluster slice 0 149213 132817 1 1 1 578371 76655 1 0 2 91703 74048 2

Map one value to all values with a common relation Scala

☆樱花仙子☆ 提交于 2019-12-24 08:39:14
问题 Having a set of data: {sentenceA1}{\t}{sentenceB1} {sentenceA1}{\t}{sentenceB2} {sentenceA2}{\t}{sentenceB1} {sentenceA3}{\t}{sentenceB1} {sentenceA4}{\t}{sentenceB2} I want to map a sentenceA to all the sentences that have a common sentenceB in Scala so the result will be something like this: {sentenceA1}->{sentenceA2,sentenceA3,sentenceA4} or {sentenceA2}->{sentenceA1, sentenceA3} 回答1: val lines = List( "sentenceA1\tsentenceB1", "sentenceA1\tsentenceB2", "sentenceA2\tsentenceB1",

CakePHP Won't Apply Group By Condition

怎甘沉沦 提交于 2019-12-24 07:58:52
问题 I am trying to do a find on one of my tables, and I'm dynamically adding conditions to the hasMany relationship from one model to the other. Everything works fine, except, Cake will not apply the group condition to the query. If I copy the query that is generated and run it in MySQL and add my Group By condition, it works beautifully. I have tried a number of things, to no avail, and the way that I have it now is setup just like the Find page in the Cake docs tell me to setup a group by. You

Query with GROUP BY and ORDER BY not working when multiple columns in SELECT are chosen

烈酒焚心 提交于 2019-12-24 07:58:30
问题 I'm updating an old website and one of the queries isn't working anymore: SELECT * FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2 I noticed if I dropped the GROUP BY it works, but the result set doesn't match the original: SELECT * FROM tbl WHERE col1 IS NULL ORDER BY col2 So I tried reading up on GROUP BY in the docs to see what might be the issue, and it seemed to suggest not using * to select all the fields, but explicitly using the column name so I tried it with just the column

Used groupby to select most recent data, want to append a column that returns the date of the data

倾然丶 夕夏残阳落幕 提交于 2019-12-24 07:49:31
问题 I originally had a dataframe that looked like this: industry population %of rural land country date Australia 2017-01-01 NaN NaN NaN 2016-01-01 24.327571 18.898304 12 2015-01-01 25.396251 18.835267 12 2014-01-01 27.277007 18.834835 13 United States 2017-01-01 NaN NaN NaN 2016-01-01 NaN 19.028231 NaN 2015-01-01 20.027274 19.212860 NaN 2014-01-01 20.867359 19.379071 NaN I applied the following code which pulled the most recent data for each of the columns for each of the countries and resulted

How to optmize linq query for grouping dates by price without merging results

蓝咒 提交于 2019-12-24 07:35:58
问题 I have this linq query as stated below. The problem is when the data is grouped by price, it groups dates by price without considering a case where same price can occur for nonconsecutive dates using System; using System.Collections.Generic; using System.Linq; public class Program { public static void Main() { //Console.WriteLine("Hello World"); List<Prices> list = new List<Prices>(); list.Add(new Prices() { Date = DateTime.Parse("2017-06-17"), Price = Double.Parse("50")}); list.Add(new