How to explode a list inside a Dataframe cell into separate rows

后端 未结 11 2287
天命终不由人
天命终不由人 2020-11-22 10:20

I\'m looking to turn a pandas cell containing a list into rows for each of those values.

So, take this:

If I\'d like to unpack and stack the value

11条回答
  •  不知归路
    2020-11-22 11:03

    In the code below, I first reset the index to make the row iteration easier.

    I create a list of lists where each element of the outer list is a row of the target DataFrame and each element of the inner list is one of the columns. This nested list will ultimately be concatenated to create the desired DataFrame.

    I use a lambda function together with list iteration to create a row for each element of the nearest_neighbors paired with the relevant name and opponent.

    Finally, I create a new DataFrame from this list (using the original column names and setting the index back to name and opponent).

    df = (pd.DataFrame({'name': ['A.J. Price'] * 3, 
                        'opponent': ['76ers', 'blazers', 'bobcats'], 
                        'nearest_neighbors': [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']] * 3})
          .set_index(['name', 'opponent']))
    
    >>> df
                                                        nearest_neighbors
    name       opponent                                                  
    A.J. Price 76ers     [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
               blazers   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
               bobcats   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
    
    df.reset_index(inplace=True)
    rows = []
    _ = df.apply(lambda row: [rows.append([row['name'], row['opponent'], nn]) 
                             for nn in row.nearest_neighbors], axis=1)
    df_new = pd.DataFrame(rows, columns=df.columns).set_index(['name', 'opponent'])
    
    >>> df_new
                        nearest_neighbors
    name       opponent                  
    A.J. Price 76ers          Zach LaVine
               76ers           Jeremy Lin
               76ers        Nate Robinson
               76ers                Isaia
               blazers        Zach LaVine
               blazers         Jeremy Lin
               blazers      Nate Robinson
               blazers              Isaia
               bobcats        Zach LaVine
               bobcats         Jeremy Lin
               bobcats      Nate Robinson
               bobcats              Isaia
    

    EDIT JUNE 2017

    An alternative method is as follows:

    >>> (pd.melt(df.nearest_neighbors.apply(pd.Series).reset_index(), 
                 id_vars=['name', 'opponent'],
                 value_name='nearest_neighbors')
         .set_index(['name', 'opponent'])
         .drop('variable', axis=1)
         .dropna()
         .sort_index()
         )
    

提交回复
热议问题