pandas-groupby

python: cumulative concatenate in pandas dataframe

只谈情不闲聊 提交于 2021-02-11 15:45:07
问题 How to do a cumulative concatenate in pandas dataframe? I found there are a number of solutions in R, but can't find it in python. Here is the problem: suppose we have a dataframe: with columns: date and name : import pandas as pd d = {'date': [1,1,2,2,3,3,3,4,4,4], 'name':['A','B','A','C','A','B','B','A','B','C']} df = pd.DataFrame(data=d) I want to get CUM_CONCAT , which is a cumulative concatenate groupby date: date name CUM_CONCAT 0 1 A [A] 1 1 B [A,B] 2 2 A [A] 3 2 C [A,C] 4 3 A [A] 5 3

python: cumulative concatenate in pandas dataframe

五迷三道 提交于 2021-02-11 15:44:51
问题 How to do a cumulative concatenate in pandas dataframe? I found there are a number of solutions in R, but can't find it in python. Here is the problem: suppose we have a dataframe: with columns: date and name : import pandas as pd d = {'date': [1,1,2,2,3,3,3,4,4,4], 'name':['A','B','A','C','A','B','B','A','B','C']} df = pd.DataFrame(data=d) I want to get CUM_CONCAT , which is a cumulative concatenate groupby date: date name CUM_CONCAT 0 1 A [A] 1 1 B [A,B] 2 2 A [A] 3 2 C [A,C] 4 3 A [A] 5 3

What is the difference between bins when using groupby apply vs resample apply?

孤街浪徒 提交于 2021-02-11 15:37:54
问题 This is somewhat of a broad topic, but I will try to pare it to some specific questions. I have noticed a difference between resample and groupby that I am curious to learn about. Here is some hourly time series data: In[]: import pandas as pd dr = pd.date_range('01-01-2020 8:00', periods=10, freq='H') df = pd.DataFrame({'A':range(10), 'B':range(10,20), 'C':range(20,30)}, index=dr) df Out[]: A B C 2020-01-01 08:00:00 0 10 20 2020-01-01 09:00:00 1 11 21 2020-01-01 10:00:00 2 12 22 2020-01-01

What is the difference between bins when using groupby apply vs resample apply?

人走茶凉 提交于 2021-02-11 15:34:32
问题 This is somewhat of a broad topic, but I will try to pare it to some specific questions. I have noticed a difference between resample and groupby that I am curious to learn about. Here is some hourly time series data: In[]: import pandas as pd dr = pd.date_range('01-01-2020 8:00', periods=10, freq='H') df = pd.DataFrame({'A':range(10), 'B':range(10,20), 'C':range(20,30)}, index=dr) df Out[]: A B C 2020-01-01 08:00:00 0 10 20 2020-01-01 09:00:00 1 11 21 2020-01-01 10:00:00 2 12 22 2020-01-01

What is the difference between bins when using groupby apply vs resample apply?

浪尽此生 提交于 2021-02-11 15:34:28
问题 This is somewhat of a broad topic, but I will try to pare it to some specific questions. I have noticed a difference between resample and groupby that I am curious to learn about. Here is some hourly time series data: In[]: import pandas as pd dr = pd.date_range('01-01-2020 8:00', periods=10, freq='H') df = pd.DataFrame({'A':range(10), 'B':range(10,20), 'C':range(20,30)}, index=dr) df Out[]: A B C 2020-01-01 08:00:00 0 10 20 2020-01-01 09:00:00 1 11 21 2020-01-01 10:00:00 2 12 22 2020-01-01

multiple merge operations on two dataframes using pandas

倖福魔咒の 提交于 2021-02-11 13:59:43
问题 I have two dataframes where multiple operations are to be implemented, for example: old_DF id col1 col2 col3 ------------------------- 1 aaa 2 bbb 123 new_DF id col1 col2 col3 ------------------------- 1 xxx 999 2 xxx kkk The following operations need to be performed on these dataframes: Merging the two dataframes Replacing only the blanks (NAs) cells in the old_DF with corresponding values from new_DF Cells from both the dataframes where the values are contradicting should be reported in a

multiple merge operations on two dataframes using pandas

自作多情 提交于 2021-02-11 13:58:25
问题 I have two dataframes where multiple operations are to be implemented, for example: old_DF id col1 col2 col3 ------------------------- 1 aaa 2 bbb 123 new_DF id col1 col2 col3 ------------------------- 1 xxx 999 2 xxx kkk The following operations need to be performed on these dataframes: Merging the two dataframes Replacing only the blanks (NAs) cells in the old_DF with corresponding values from new_DF Cells from both the dataframes where the values are contradicting should be reported in a

Aggregate DataFrame base on list values

最后都变了- 提交于 2021-02-11 13:55:08
问题 I have the next problem. I have a list with string values: a = ['word1', 'word2', 'word3', 'word4', ..., 'wordN'] And I have the dataframe with values: +--------------+----------+-----------+ | keywords | impressions | clicks | +--------------+----------+-----------+ | word1 | 1245523 | 12321231 | +--------------+----------+-----------+ | word2 | 4212321 | 12312312 | +--------------+----------+-----------+ ........................................ Please advice me on how to create a specific,

Calculate percentage on DataFrame

只谈情不闲聊 提交于 2021-02-11 06:07:37
问题 I'm trying to calculate the percentage of each crime of the following Dataframe: Violent Murder Larceny_Theft Vehicle_Theft Year 1960 288460 3095700 1855400 328200 1961 289390 3198600 1913000 336000 1962 301510 3450700 2089600 366800 1963 316970 3792500 2297800 408300 1964 364220 4200400 2514400 472800 So I should calculate first the total of crimes per year and then use that to calculate the percentage of each crime. I was trying the following: > perc = (crime *100) / crime.sum(axis=1) Any

Calculate percentage on DataFrame

喜欢而已 提交于 2021-02-11 06:07:27
问题 I'm trying to calculate the percentage of each crime of the following Dataframe: Violent Murder Larceny_Theft Vehicle_Theft Year 1960 288460 3095700 1855400 328200 1961 289390 3198600 1913000 336000 1962 301510 3450700 2089600 366800 1963 316970 3792500 2297800 408300 1964 364220 4200400 2514400 472800 So I should calculate first the total of crimes per year and then use that to calculate the percentage of each crime. I was trying the following: > perc = (crime *100) / crime.sum(axis=1) Any