问题
I'm working with pandas and I used the groupby:
group = df_crimes_query.groupby(["CrimeDateTime", "WeaponFactor"]).size()
group.head(20)
CrimeDateTime WeaponFactor
2016-01-01 FIREARM 11
HANDS 26
KNIFE 3
OTHER 11
UNDEFINED 102
2016-01-02 FIREARM 10
HANDS 21
KNIFE 8
OTHER 6
UNDEFINED 68
2016-01-03 FIREARM 12
HANDS 13
KNIFE 6
OTHER 5
UNDEFINED 73
2016-01-04 FIREARM 11
HANDS 10
KNIFE 1
OTHER 3
UNDEFINED 84
dtype: int64
The type of it is a Series:
type(group)
pandas.core.series.Series
I would like a dataframe about like this:
CrimeDateTime FIREARM HANDS KNIFE OTHER UNDEFINED
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
I would like to use this dataframe for I plot five time series after, one for each type (FIREARM, HANDS and etc.). I had tried, searched on web, however without success.
The code is in my GitHub (in section called Testing): https://github.com/rmmariano/CAP386_intro_data_science/blob/master/projeto/crimes_baltimore/crimes_baltimore.ipynb
I had others testing codes, but I had removed to be clearest.
Someone has any idea?
回答1:
Option 1
Simple and slow
pd.crosstab(df.CrimeDateTime, df.WeaponFactor)
WeaponFactor FIREARM HANDS KNIFE OTHER UNDEFINED
CrimeDateTime
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
Option 2
Faster and Cool!
pd.get_dummies(df.CrimeDateTime).T.dot(pd.get_dummies(df.WeaponFactor))
FIREARM HANDS KNIFE OTHER UNDEFINED
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
Option 3
Next Level Kung Fu Panda!
i, r = pd.factorize(df.CrimeDateTime.values)
j, c = pd.factorize(df.WeaponFactor.values)
n, m = r.size, c.size
b = np.bincount(j + i * m, minlength=n * m).reshape(n, m)
pd.DataFrame(b, r, c)
FIREARM HANDS KNIFE OTHER UNDEFINED
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
回答2:
You will get the desired result using
df_crimes_query.groupby(["CrimeDateTime", "WeaponFactor"]).size().unstack().reset_index()
回答3:
Instead of groupby you can use pivot table i.e
df.pivot_table(index='CrimeDateTime',columns='WeaponFactor',values='count')
Based on your code in the notebook if you have a dataframe like this
CrimeDateTime WeaponFactor count 0 2016-01-01 FIREARM 11 1 2016-01-01 HANDS 26 2 2016-01-01 KNIFE 3 3 2016-01-01 OTHER 11 4 2016-01-01 UNDEFINED 102 5 2016-01-02 FIREARM 10 6 2016-01-02 HANDS 21 7 2016-01-02 KNIFE 8 8 2016-01-02 OTHER 6 9 2016-01-02 UNDEFINED 68 10 2016-01-03 FIREARM 12 11 2016-01-03 HANDS 13 12 2016-01-03 KNIFE 6 13 2016-01-03 OTHER 5 14 2016-01-03 UNDEFINED 73 15 2016-01-04 FIREARM 11 16 2016-01-04 HANDS 10 17 2016-01-04 KNIFE 1 18 2016-01-04 OTHER 3 19 2016-01-04 UNDEFINED 84
Output:
df.pivot_table(index='CrimeDateTime',columns='WeaponFactor',values='count')
WeaponFactor FIREARM HANDS KNIFE OTHER UNDEFINED CrimeDateTime 2016-01-01 11 26 3 11 102 2016-01-02 10 21 8 6 68 2016-01-03 12 13 6 5 73 2016-01-04 11 10 1 3 84 In [595]:
来源:https://stackoverflow.com/questions/46010122/transform-a-series-in-a-dataframe-of-pandas-python-where-the-columns-are-the-l