Transform a Series in a dataframe (of pandas/Python) where the columns are the levels of the Series

问题

I'm working with pandas and I used the groupby:

group = df_crimes_query.groupby(["CrimeDateTime", "WeaponFactor"]).size()
group.head(20)


CrimeDateTime  WeaponFactor
2016-01-01     FIREARM          11
               HANDS            26
               KNIFE             3
               OTHER            11
               UNDEFINED       102
2016-01-02     FIREARM          10
               HANDS            21
               KNIFE             8
               OTHER             6
               UNDEFINED        68
2016-01-03     FIREARM          12
               HANDS            13
               KNIFE             6
               OTHER             5
               UNDEFINED        73
2016-01-04     FIREARM          11
               HANDS            10
               KNIFE             1
               OTHER             3
               UNDEFINED        84
dtype: int64

The type of it is a Series:

type(group)

pandas.core.series.Series

I would like a dataframe about like this:

CrimeDateTime   FIREARM     HANDS   KNIFE   OTHER   UNDEFINED
2016-01-01      11          26      3       11      102
2016-01-02      10          21      8       6       68
2016-01-03      12          13      6       5       73
2016-01-04      11          10      1       3       84

I would like to use this dataframe for I plot five time series after, one for each type (FIREARM, HANDS and etc.). I had tried, searched on web, however without success.

The code is in my GitHub (in section called Testing): https://github.com/rmmariano/CAP386_intro_data_science/blob/master/projeto/crimes_baltimore/crimes_baltimore.ipynb

I had others testing codes, but I had removed to be clearest.

Someone has any idea?

回答1:

Option 1
Simple and slow

pd.crosstab(df.CrimeDateTime, df.WeaponFactor)

WeaponFactor   FIREARM  HANDS  KNIFE  OTHER  UNDEFINED
CrimeDateTime                                         
2016-01-01          11     26      3     11        102
2016-01-02          10     21      8      6         68
2016-01-03          12     13      6      5         73
2016-01-04          11     10      1      3         84

Option 2
Faster and Cool!

pd.get_dummies(df.CrimeDateTime).T.dot(pd.get_dummies(df.WeaponFactor))

            FIREARM  HANDS  KNIFE  OTHER  UNDEFINED
2016-01-01       11     26      3     11        102
2016-01-02       10     21      8      6         68
2016-01-03       12     13      6      5         73
2016-01-04       11     10      1      3         84

Option 3
Next Level Kung Fu Panda!

i, r = pd.factorize(df.CrimeDateTime.values)
j, c = pd.factorize(df.WeaponFactor.values)
n, m = r.size, c.size
b = np.bincount(j + i * m, minlength=n * m).reshape(n, m)

pd.DataFrame(b, r, c)

            FIREARM  HANDS  KNIFE  OTHER  UNDEFINED
2016-01-01       11     26      3     11        102
2016-01-02       10     21      8      6         68
2016-01-03       12     13      6      5         73
2016-01-04       11     10      1      3         84

回答2:

You will get the desired result using

df_crimes_query.groupby(["CrimeDateTime", "WeaponFactor"]).size().unstack().reset_index()

回答3:

Instead of groupby you can use pivot table i.e

 df.pivot_table(index='CrimeDateTime',columns='WeaponFactor',values='count')

Based on your code in the notebook if you have a dataframe like this

  CrimeDateTime WeaponFactor  count
0     2016-01-01      FIREARM     11
1     2016-01-01        HANDS     26
2     2016-01-01        KNIFE      3
3     2016-01-01        OTHER     11
4     2016-01-01    UNDEFINED    102
5     2016-01-02      FIREARM     10
6     2016-01-02        HANDS     21
7     2016-01-02        KNIFE      8
8     2016-01-02        OTHER      6
9     2016-01-02    UNDEFINED     68
10    2016-01-03      FIREARM     12
11    2016-01-03        HANDS     13
12    2016-01-03        KNIFE      6
13    2016-01-03        OTHER      5
14    2016-01-03    UNDEFINED     73
15    2016-01-04      FIREARM     11
16    2016-01-04        HANDS     10
17    2016-01-04        KNIFE      1
18    2016-01-04        OTHER      3
19    2016-01-04    UNDEFINED     84

Output:

df.pivot_table(index='CrimeDateTime',columns='WeaponFactor',values='count')

WeaponFactor   FIREARM  HANDS  KNIFE  OTHER  UNDEFINED
CrimeDateTime                                         
2016-01-01          11     26      3     11        102
2016-01-02          10     21      8      6         68
2016-01-03          12     13      6      5         73
2016-01-04          11     10      1      3         84
In [595]:

来源：https://stackoverflow.com/questions/46010122/transform-a-series-in-a-dataframe-of-pandas-python-where-the-columns-are-the-l

标签

python

pandas

dataframe

series

data-conversion