group-by

How can I count the number of grouped pairs in which one row's column value is greater than another?

大兔子大兔子 提交于 2020-03-04 20:01:12
问题 I have a dataset (df1) with a number of paired values. One row of the pair is for one year (e.g., 2014), the other for a different year (e.g., 2013). For each pair is a value in the column G. I need a count of the number of pairs in which the G value for the higher year is less than the G value for the lesser year. Here is my dput for the dataset df1: structure(list(Name = c("A.J. Ellis", "A.J. Ellis", "A.J. Pierzynski", "A.J. Pierzynski", "Aaron Boone", "Adam Kennedy", "Adam Melhuse",

Get aggregated average values joining three tables and display them next to each value in first table

橙三吉。 提交于 2020-03-04 06:28:37
问题 I have three tables which you can also find in the SQL fiddle: CREATE TABLE Sales ( Product_ID VARCHAR(255), Sales_Value VARCHAR(255), Sales_Quantity VARCHAR(255) ); INSERT INTO Sales (Product_ID, Sales_Value, Sales_Quantity) VALUES ("P001", "500", "200"), ("P002", "600", "100"), ("P003", "300", "250"), ("P004", "900", "400"), ("P005", "800", "600"), ("P006", "200", "150"), ("P007", "700", "550"); CREATE TABLE Products ( Product_ID VARCHAR(255), Product_Name VARCHAR(255), Category_ID VARCHAR

Filtering an aggregated chart with another aggregation field

泪湿孤枕 提交于 2020-03-03 09:45:57
问题 I'm trying to produce something similar to the K-top example. Except that instead of filtering out and displaying the same aggregated field data, I want: to display one type of aggregated data (the max of daily temps) and filter on another aggregation field ( the mean of daily temps) I've created an observable notebook here to build my test case, and this is how far I got. { "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "data": {"url": "data/seattle-weather.csv"}, "transform":

Filtering an aggregated chart with another aggregation field

 ̄綄美尐妖づ 提交于 2020-03-03 09:45:27
问题 I'm trying to produce something similar to the K-top example. Except that instead of filtering out and displaying the same aggregated field data, I want: to display one type of aggregated data (the max of daily temps) and filter on another aggregation field ( the mean of daily temps) I've created an observable notebook here to build my test case, and this is how far I got. { "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "data": {"url": "data/seattle-weather.csv"}, "transform":

How to properly GROUP BY in MySQL?

雨燕双飞 提交于 2020-02-29 03:19:10
问题 I have the following (intentionally denormalized for demonstrating purposes) sample CARS table: | CAR_ID | OWNER_ID | OWNER_NAME | COLOR | |--------|----------|------------|-------| | 1 | 1 | John | White | | 2 | 1 | John | Black | | 3 | 2 | Mike | White | | 4 | 2 | Mike | Black | | 5 | 2 | Mike | Brown | | 6 | 3 | Tony | White | If I wanted to count the amount of cars per owner and return this: | OWNER_ID | OWNER_NAME | TOTAL | |----------|------------|-------| | 1 | John | 2 | | 2 | Mike |

Select multiple aggregates of a joined table on Postgres

ε祈祈猫儿з 提交于 2020-02-25 13:13:50
问题 Given the tables projects : id | bigint | not null default nextval('projects_id_seq'::regclass) name | character varying | created_at | timestamp(6) without time zone | not null updated_at | timestamp(6) without time zone | not null and tasks : id | bigint | not null default nextval('tasks_id_seq'::regclass) name | character varying | project_id | bigint | not null created_at | timestamp(6) without time zone | not null updated_at | timestamp(6) without time zone | not null status | task

How to sum negative and positive values separately when using groupby in pandas?

谁说我不能喝 提交于 2020-02-23 11:32:10
问题 How to sum positive and negative values differently in pandas and put them let's say in positive and negative columns? I have this dataframe like below: df = pandas.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : np.random.randn(8), 'D' : np.random.randn(8)}) Output is as below: df A B C D 0 foo one 0.374156 0.319699 1 bar one -0.356339 -0.629649 2 foo two -0.390243 -1.387909 3 bar three -0

How to sum negative and positive values separately when using groupby in pandas?

独自空忆成欢 提交于 2020-02-23 11:32:06
问题 How to sum positive and negative values differently in pandas and put them let's say in positive and negative columns? I have this dataframe like below: df = pandas.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : np.random.randn(8), 'D' : np.random.randn(8)}) Output is as below: df A B C D 0 foo one 0.374156 0.319699 1 bar one -0.356339 -0.629649 2 foo two -0.390243 -1.387909 3 bar three -0

Pandas monthly rolling window

时间秒杀一切 提交于 2020-02-23 03:42:31
问题 I am looking to do a 'monthly' rolling window on daily data grouped by a category. The code below does not work as is, it leads to the following error: ValueError: <DateOffset: months=1> is a non-fixed frequency I know that I could use '30D' offset, however this would shift the date over time. I'm looking for the sum of a window that spans from the x-th day of a month to that same x-th day of the J-th month. E.g. with J=1: 4th of July to 4th of August, 5th of July to 5th of August, 6th of

Using dplyr to group_by and conditionally mutate a dataframe by group

不想你离开。 提交于 2020-02-22 06:19:22
问题 I'd like to use dplyr functions to group_by and conditionally mutate a df. Given this sample data: A B C D 1 1 1 0.25 1 1 2 0 1 2 1 0.5 1 2 2 0 1 3 1 0.75 1 3 2 0.25 2 1 1 0 2 1 2 0.5 2 2 1 0 2 2 2 0 2 3 1 0 2 3 2 0 3 1 1 0.5 3 1 2 0 3 2 1 0.25 3 2 2 1 3 3 1 0 3 3 2 0.75 I want to use new column E to categorize A by whether B == 1, C == 2, and D > 0. For each unique value of A for which all of these conditions hold true, then E = 1, else E = 0. So, the output should look like this: A B C D E