pandas

'Could not interpret input' error with Seaborn when plotting groupbys

对着背影说爱祢 提交于 2021-02-07 05:19:19
问题 Say I have this dataframe d = { 'Path' : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'], 'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'], 'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'], 'Value' : [30, 20, 10, 40, 40, 50], 'Field' : [50, 70, 10, 20, 30, 30] } df = DataFrame(d) df.set_index(['Path', 'Detail'], inplace=True) df Field Program Value Path Detail abc foo 50 prog1 30 bar 70 prog1 20 ghi bar 10 prog1 10 foo 20 prog2 40 jkl foo 30 prog3 40 foo 30 prog3 50 I can aggregate it

How does pandas calculate skew

試著忘記壹切 提交于 2021-02-07 05:10:48
问题 I'm calculating a coskew matrix and wanted to double check my calculation with pandas built in skew method. I could not reconcile how pandas performing the calculation. define my series as: import pandas as pd series = pd.Series( {0: -0.051917457635120283, 1: -0.070071606515280632, 2: -0.11204865874074735, 3: -0.14679988245503134, 4: -0.088062467095565145, 5: 0.17579741198527793, 6: -0.10765856028420773, 7: -0.11971470229167547, 8: -0.15169210769159247, 9: -0.038616800990881606, 10: 0

Pandas: conditional shift

断了今生、忘了曾经 提交于 2021-02-07 04:59:43
问题 There is a way to shift a dataframe column dependently on the condition on two other columns? something like: df["cumulated_closed_value"] = df.groupby("user").['close_cumsum'].shiftWhile(df['close_time']>df['open_time]) I have figured out a way to do this but it's inefficient: 1)Load data and create the column to shift df=pd.read_csv('data.csv') df.sort_values(['user','close_time'],inplace=True) df['close_cumsum']=df.groupby('user')['value'].cumsum() df.sort_values(['user','open_time']

pandas dataframe from a nested dictionary (elasticsearch result)

谁说我不能喝 提交于 2021-02-07 04:20:19
问题 I am having hard time translating results from elasticsearch aggregations to pandas. I am trying to write an abstract function which would take nested dictionary (arbitrary number of levels) and flatten them into a pandas dataframe Here is how a typical result look like -- edit : I added the parent key as well x1 = {u'xColor': {u'buckets': [{u'doc_count': 4, u'key': u'red', u'xMake': {u'buckets': [{u'doc_count': 3, u'key': u'honda', u'xCity': {u'buckets': [{u'doc_count': 2, u'key': u'ROME'},

Annotated heatmap with multiple color schemes

时光怂恿深爱的人放手 提交于 2021-02-07 04:18:46
问题 I have the following dataframe and would like to differentiate the minor decimal differences in each "step" with a different color scheme in a heatmap. Sample data: Sample Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 A 64.847 54.821 20.897 39.733 23.257 74.942 75.945 B 64.885 54.767 20.828 39.613 23.093 74.963 75.928 C 65.036 54.772 20.939 39.835 23.283 74.944 75.871 D 64.869 54.740 21.039 39.889 23.322 74.925 75.894 E 64.911 54.730 20.858 39.608 23.101 74.956 75.930 F 64.838 54.749 20

Annotated heatmap with multiple color schemes

眉间皱痕 提交于 2021-02-07 04:17:17
问题 I have the following dataframe and would like to differentiate the minor decimal differences in each "step" with a different color scheme in a heatmap. Sample data: Sample Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 A 64.847 54.821 20.897 39.733 23.257 74.942 75.945 B 64.885 54.767 20.828 39.613 23.093 74.963 75.928 C 65.036 54.772 20.939 39.835 23.283 74.944 75.871 D 64.869 54.740 21.039 39.889 23.322 74.925 75.894 E 64.911 54.730 20.858 39.608 23.101 74.956 75.930 F 64.838 54.749 20

Annotated heatmap with multiple color schemes

六月ゝ 毕业季﹏ 提交于 2021-02-07 04:17:09
问题 I have the following dataframe and would like to differentiate the minor decimal differences in each "step" with a different color scheme in a heatmap. Sample data: Sample Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 A 64.847 54.821 20.897 39.733 23.257 74.942 75.945 B 64.885 54.767 20.828 39.613 23.093 74.963 75.928 C 65.036 54.772 20.939 39.835 23.283 74.944 75.871 D 64.869 54.740 21.039 39.889 23.322 74.925 75.894 E 64.911 54.730 20.858 39.608 23.101 74.956 75.930 F 64.838 54.749 20

statsmodels: printing summary of more than one regression models together

怎甘沉沦 提交于 2021-02-07 04:16:24
问题 In the Python library Statsmodels , you can print out the regression results with print(results.summary()) , how can I print out the summary of more than one regressions in one table, for better comparison? A linear regression, code taken from statsmodels documentation: nsample = 100 x = np.linspace(0, 10, 100) X = np.column_stack((x, x**2)) beta = np.array([0.1, 10]) e = np.random.normal(size=nsample) y = np.dot(X, beta) + e model = sm.OLS(y, X) results_noconstant = model.fit() Then I add a

statsmodels: printing summary of more than one regression models together

时间秒杀一切 提交于 2021-02-07 04:15:17
问题 In the Python library Statsmodels , you can print out the regression results with print(results.summary()) , how can I print out the summary of more than one regressions in one table, for better comparison? A linear regression, code taken from statsmodels documentation: nsample = 100 x = np.linspace(0, 10, 100) X = np.column_stack((x, x**2)) beta = np.array([0.1, 10]) e = np.random.normal(size=nsample) y = np.dot(X, beta) + e model = sm.OLS(y, X) results_noconstant = model.fit() Then I add a

Python Multiindex Dataframe remove maximum

牧云@^-^@ 提交于 2021-02-07 04:08:23
问题 I am struggling with MultiIndex DataFrame in python pandas. Suppose I have a df like this: count day group name A Anna 10 Monday Beatrice 15 Tuesday B Beatrice 15 Wednesday Cecilia 20 Thursday What I need is to find the maximum in name for each group and remove it from the dataframe. The final df would look like: count day group name A Anna 10 Monday B Beatrice 15 Wednesday Does any of you have any idea how to do this? I am running out of ideas... Thanks in advance! EDIT: What if the original