pandas-groupby

Comma separated values from pandas GroupBy

…衆ロ難τιáo~ 提交于 2019-12-11 14:43:35
问题 i trying to find out if there is away to remove duplicate in my data frame while concatenating the value example: df key v1 v2 0 1 n/a a 1 2 n/a b 2 3 n/a c 3 2 n/a d 4 3 n/a e the out put should be like: df_out key v1 v2 0 1 n/a a 1 2 n/a b,d 2 3 n/a c,e I try using df.drop_duplicates() and some loop to save the v2 column value and nothing yet. i'm trying to do it nice and clean with out loop by using Pandas. some one know a way pandas can do it? 回答1: This should be easy, assuming you have

Average values in last n days pandas

眉间皱痕 提交于 2019-12-11 14:38:35
问题 I've got a dataframe of golfers and their golf rounds in various tournaments (see dictionary of df head posted below). I need a fast way of computing, for each round the player plays, his average 'strokes gained' (SG) over the previous n days, where n is any value I decide. I would know how to do this by converting the dataframe into a list of lists and iterating through but that would be very slow. Ideally I want an extra column in the Pandas df titled 'Player's average SG in last 100 days'.

pandas groupby plot values

淺唱寂寞╮ 提交于 2019-12-11 14:19:05
问题 I have a pandas dataframe that looks like this: **real I SI weights** 0 1 3 0.3 0 2 4 0.2 0 1 3 0.5 0 1 5 0.5 1 2 5 0.3 1 2 4 0.2 1 1 3 0.5 I need to divide it by "real", then I need to do the following: given a value of I, consider each value of SI and add the total weight. At the end, I should have, for each realization, something like that: real = 0: I = 1 SI = 3 weight = 0.8 SI = 5 weight = 0.5 I = 2 SI = 4 weight = 0.2 real = 1: I = 1 SI = 3 weight = 0.5 I = 2 SI = 5 weight = 0.3 SI = 4

Series of if statements applied to data frame

♀尐吖头ヾ 提交于 2019-12-11 13:26:22
问题 I have a question on how to this task. I want to return or group a series of numbers in my data frame, the numbers are from the column 'PD' which ranges from .001 to 1. What I want to do is to group those that are .91>'PD'>.9 to .91 (or return a value of .91), .92>'PD'>=.91 to .92, ..., 1>='PD' >=.99 to 1. onto a column named 'Grouping'. What I have been doing is manually doing each if statement then merging it with the base data frame. Can anyone please help me with a more efficient way of

Conditional Running Count in Pandas for All Previous Rows Only

懵懂的女人 提交于 2019-12-11 12:18:41
问题 Suppose I have the following DataFrame: df = pd.DataFrame({'Event': ['A', 'B', 'A', 'A', 'B', 'C', 'B', 'B', 'A', 'C'], 'Date': ['2019-01-01', '2019-02-01', '2019-03-01', '2019-03-01', '2019-02-15', '2019-03-15', '2019-04-05', '2019-04-05', '2019-04-15', '2019-06-10'], 'Sale':[100,200,150,200,150,100,300,250,500,400]}) df['Date'] = pd.to_datetime(df['Date']) df Event Date A 2019-01-01 B 2019-02-01 A 2019-03-01 A 2019-03-01 B 2019-02-15 C 2019-03-15 B 2019-04-05 B 2019-04-05 A 2019-04-15 C

Compare column values in two or three dataframe and merge

瘦欲@ 提交于 2019-12-11 12:14:05
问题 I have already checked few earlier questions and I have some what unique problem. I have three excel file and I load them into three different dataframe. Basically I have to add contents of excel_1 and excel_2 and compare the contents against excel_3 Example data: (excel_1 sales Territory#1) Name Year Item sales_Amount1 A1 1.2019 Badam 2 A1 1.2019 Badam 10 A1 1.2019 carrot 8 A1 1.2019 carrot 10 A2 1.2019 Badam 10 A2 1.2019 Badam 20 A3 2.2019 soap 3 A1 2.2019 soap 1 Example data: (excel_2

Using groupby and loc to set up a new dataframe

╄→гoц情女王★ 提交于 2019-12-11 12:05:14
问题 Hi I have a data frame as follow: df = pd.DataFrame() df['Team1'] = ['A','B','C','D','E','F','A','B','C','D','E','F'] df['Score1'] = [1,2,3,1,2,4,1,2,3,1,2,4] df['Team2'] = ['U','V','W','X','Y','Z','U','V','W','X','Y','Z'] df['Score2'] = [2,1,2,2,3,3,2,1,2,2,3,3] df['Match'] = df['Team1'] + ' Vs '+ df['Team2'] df['Match_no']= [1,2,3,4,5,6,1,2,3,4,5,6] df['model'] = ['ELO','ELO','ELO','ELO','ELO','ELO','xG','xG','xG','xG','xG','xG'] winner = df.Score1>df.Score2 df['winner'] = np.where(winner

Use pandas to group by column and then create a new column based on a condition

╄→гoц情女王★ 提交于 2019-12-11 11:54:35
问题 I need to reproduce with pandas what SQL does so easily: select del_month , sum(case when off0_on1 = 1 then 1 else 0 end) as on1 , sum(case when off0_on1 = 0 then 1 else 0 end) as off0 from a1 group by del_month order by del_month Here is a sample, illustrative pandas dataframe to work on: a1 = pd.DataFrame({'del_month':[1,1,1,1,2,2,2,2], 'off0_on1':[0,0,1,1,0,1,1,1]}) Here are my attempts to reproduce the above SQL with pandas. The first line works. The second line gives an error: a1['on1']

Python - Pandas - Combining rows of multiple columns into single row in dataframe based on categorical value

大城市里の小女人 提交于 2019-12-11 11:27:19
问题 I'm working on a problem involving Pandas in Python 3.4. I'm stuck at one small subsection which involves re-organizing my data frames. I shall be more specific. I have a table called "model" in the format of: Model Input I wish to get the desired output in the form equivalent to: I wish to get the output similar to: Desired Output I have looked into Convert a python dataframe with multiple rows into one row using python pandas? and How to combine multiple rows into a single row with pandas.

grouping by weekly days pandas

馋奶兔 提交于 2019-12-11 11:06:24
问题 I have a dataframe,df containing Index Date & Time eventName eventCount 0 2017-08-09 ABC 24 1 2017-08-09 CDE 140 2 2017-08-10 CDE 150 3 2017-08-11 DEF 200 4 2017-08-11 ABC 20 5 2017-08-16 CDE 10 6 2017-08-16 ABC 15 7 2017-08-17 CDE 10 8 2017-08-17 DEF 50 9 2017-08-18 DEF 80 ... I want to sum the eventCount for each weekly day occurrences and plot for the total events for each weekly day(from MON to SUN) i.e. for example: Summation of the eventCount values of: 2017-08-09 and 2017-08-16(Mondays