pandas

Rank by grouby column aggregate

被刻印的时光 ゝ 提交于 2021-02-07 17:33:46
问题 I want to create a column manager_rank that ranks a manager by the sum of returns. I have come up with one solution posted below but was hoping if someone else had something more elegant. import pandas as pd df = pd.DataFrame([['2012', 'A', 1], ['2012', 'B', 4], ['2011', 'A', 5], ['2011', 'B', 4]], columns=['year', 'manager', 'return']) Desired result: year manager return manager_rank 0 2012 A 1 2 1 2011 A 5 2 2 2012 B 4 1 3 2011 B 4 1 回答1: df['ranking'] = df.groupby('manager')['return']

Bizarre behaviour of pandas Series.value_counts()

雨燕双飞 提交于 2021-02-07 17:27:38
问题 I have a Pandas Series with numerical data and I want to find its unique values together with their frequency-appearance. I use the standard procedure # Given the my_data is a column of a pd.Dataframe df unique = df[my_data].value_counts() print unique And here is the results that I get # -------------------OUTPUT -0.010000 46483 -0.010000 16895 -0.027497 12215 -0.294492 11915 0.027497 11397 What I don't get is why I have the "same value" (-0.01) occurring twice . Is that an internal

How to populate columns of a dataframe using a subset of another dataframe?

邮差的信 提交于 2021-02-07 15:35:45
问题 I have two dataframes like this import pandas as pd import numpy as np df1 = pd.DataFrame({ 'key': list('AAABBCCAAC'), 'prop1': list('xyzuuyxzzz'), 'prop2': list('mnbnbbnnnn') }) df2 = pd.DataFrame({ 'key': list('ABBCAA'), 'prop1': [np.nan] * 6, 'prop2': [np.nan] * 6, 'keep_me': ['stuff'] * 6 }) key prop1 prop2 0 A x m 1 A y n 2 A z b 3 B u n 4 B u b 5 C y b 6 C x n 7 A z n 8 A z n 9 C z n key prop1 prop2 keep_me 0 A NaN NaN stuff 1 B NaN NaN stuff 2 B NaN NaN stuff 3 C NaN NaN stuff 4 A NaN

How to populate columns of a dataframe using a subset of another dataframe?

落爺英雄遲暮 提交于 2021-02-07 15:33:18
问题 I have two dataframes like this import pandas as pd import numpy as np df1 = pd.DataFrame({ 'key': list('AAABBCCAAC'), 'prop1': list('xyzuuyxzzz'), 'prop2': list('mnbnbbnnnn') }) df2 = pd.DataFrame({ 'key': list('ABBCAA'), 'prop1': [np.nan] * 6, 'prop2': [np.nan] * 6, 'keep_me': ['stuff'] * 6 }) key prop1 prop2 0 A x m 1 A y n 2 A z b 3 B u n 4 B u b 5 C y b 6 C x n 7 A z n 8 A z n 9 C z n key prop1 prop2 keep_me 0 A NaN NaN stuff 1 B NaN NaN stuff 2 B NaN NaN stuff 3 C NaN NaN stuff 4 A NaN

How to populate columns of a dataframe using a subset of another dataframe?

与世无争的帅哥 提交于 2021-02-07 15:31:57
问题 I have two dataframes like this import pandas as pd import numpy as np df1 = pd.DataFrame({ 'key': list('AAABBCCAAC'), 'prop1': list('xyzuuyxzzz'), 'prop2': list('mnbnbbnnnn') }) df2 = pd.DataFrame({ 'key': list('ABBCAA'), 'prop1': [np.nan] * 6, 'prop2': [np.nan] * 6, 'keep_me': ['stuff'] * 6 }) key prop1 prop2 0 A x m 1 A y n 2 A z b 3 B u n 4 B u b 5 C y b 6 C x n 7 A z n 8 A z n 9 C z n key prop1 prop2 keep_me 0 A NaN NaN stuff 1 B NaN NaN stuff 2 B NaN NaN stuff 3 C NaN NaN stuff 4 A NaN

Apply function to pandas dataframe row using values in other rows

送分小仙女□ 提交于 2021-02-07 14:53:55
问题 I have a situation where I have a dataframe row to perform calculations with, and I need to use values in following (potentially preceding) rows to do these calculations (essentially a perfect forecast based on the real data set). I get each row from an earlier df.apply call, so I could pass the whole df along to the downstream objects, but that seems less than ideal based on the complexity of objects in my analysis. I found one closely related question and answer [1], but the problem is

Could not convert string to float error from the Titanic competition

我怕爱的太早我们不能终老 提交于 2021-02-07 14:52:49
问题 I'm trying to solve the Titanic survival program from Kaggle. It's my first step in actually learning Machine Learning. I have a problem where the gender column causes an error. The stacktrace says could not convert string to float: 'female' . How did you guys come across this issue? I don't want solutions. I just want a practical approach to this problem because I do need the gender column to build my model. This is my code: import pandas as pd from sklearn.tree import DecisionTreeRegressor

Creating a Pandas dataframe from elements of a dictionary

主宰稳场 提交于 2021-02-07 14:48:40
问题 I'm trying to create a pandas dataframe from a dictionary. The dictionary is set up as nvalues = {"y1": [1, 2, 3, 4], "y2": [5, 6, 7, 8], "y3": [a, b, c, d]} I would like the dataframe to include only "y1" and " y2" . So far I can accomplish this using df = pd.DataFrame.from_dict(nvalues) df.drop("y3", axis=1, inplace=True) I would like to know if it is possible to accomplish this without having df.drop() 回答1: You can specify columns in the DataFrame constructor: pd.DataFrame(nvalues, columns

Creating a Pandas dataframe from elements of a dictionary

爷,独闯天下 提交于 2021-02-07 14:48:03
问题 I'm trying to create a pandas dataframe from a dictionary. The dictionary is set up as nvalues = {"y1": [1, 2, 3, 4], "y2": [5, 6, 7, 8], "y3": [a, b, c, d]} I would like the dataframe to include only "y1" and " y2" . So far I can accomplish this using df = pd.DataFrame.from_dict(nvalues) df.drop("y3", axis=1, inplace=True) I would like to know if it is possible to accomplish this without having df.drop() 回答1: You can specify columns in the DataFrame constructor: pd.DataFrame(nvalues, columns

Creating a Pandas dataframe from elements of a dictionary

和自甴很熟 提交于 2021-02-07 14:46:46
问题 I'm trying to create a pandas dataframe from a dictionary. The dictionary is set up as nvalues = {"y1": [1, 2, 3, 4], "y2": [5, 6, 7, 8], "y3": [a, b, c, d]} I would like the dataframe to include only "y1" and " y2" . So far I can accomplish this using df = pd.DataFrame.from_dict(nvalues) df.drop("y3", axis=1, inplace=True) I would like to know if it is possible to accomplish this without having df.drop() 回答1: You can specify columns in the DataFrame constructor: pd.DataFrame(nvalues, columns