问题
I have a dataframe like this. For each Id, I have (x1,x2), (y1,y2). I want to supply these to polyfit(), get the slope and the x-intercept and add them as new columns.
Id x y
1 0.79978 0.018255
1 1.19983 0.020963
2 2.39998 0.029006
2 2.79995 0.033004
3 1.79965 0.021489
3 2.19969 0.024194
4 1.19981 0.019338
4 1.59981 0.022200
5 1.79971 0.025629
5 2.19974 0.028681
I really need help with grouping the correct rows and supplying them to polyfit. I have been struggling with this. Any help would be most welcome.
回答1:
You can groupby
and apply the fit within each group. First, set the index so you can avoid a merge later.
import pandas as pd
import numpy as np
df = df.set_index('Id')
df['fit'] = df.groupby('Id').apply(lambda x: np.polyfit(x.x, x.y, 1))
df
is now:
x y fit
Id
1 0.79978 0.018255 [0.0067691538557680215, 0.01284116612923385]
1 1.19983 0.020963 [0.0067691538557680215, 0.01284116612923385]
2 2.39998 0.029006 [0.00999574968122608, 0.005016400680051043]
2 2.79995 0.033004 [0.00999574968122608, 0.005016400680051043]
3 1.79965 0.021489 [0.006761823817618233, 0.009320083766623343]
3 2.19969 0.024194 [0.006761823817618233, 0.009320083766623343]
...
If you want separate columns for each part separately, you can apply pd.Series
df[['slope', 'intercept']] = df.fit.apply(pd.Series)
df = df.drop(columns='fit').reset_index()
df
is now:
Id x y slope intercept
0 1 0.79978 0.018255 0.006769 0.012841
1 1 1.19983 0.020963 0.006769 0.012841
2 2 2.39998 0.029006 0.009996 0.005016
3 2 2.79995 0.033004 0.009996 0.005016
4 3 1.79965 0.021489 0.006762 0.009320
5 3 2.19969 0.024194 0.006762 0.009320
6 4 1.19981 0.019338 0.007155 0.010753
7 4 1.59981 0.022200 0.007155 0.010753
8 5 1.79971 0.025629 0.007629 0.011898
9 5 2.19974 0.028681 0.007629 0.011898
来源:https://stackoverflow.com/questions/51140302/using-polyfit-on-pandas-dataframe-and-then-adding-the-results-to-new-columns