Pandas Series of lists to one series

怎甘沉沦 提交于 2019-11-30 11:01:05

You are basically just trying to flatten a nested list here.

You should just be able to iterate over the elements of the series:

slist =[]
for x in series:
    slist.extend(x)

or a slicker (but harder to understand) list comprehension:

slist = [st for row in s for st in row]

Here's a simple method using only pandas functions:

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']])

Then

s.apply(pd.Series).stack().reset_index(drop=True)

gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.

0  0         slim
   1        waist
   2          man
1  0         slim
   1    waistline
2  0        santa

If this is what you want, just omit .reset_index(drop=True) from the chain.

series_name.sum()

does exactly what you need. Do make sure it's a series of lists otherwise your values will be concatenated (if string) or added (if int)

You can try using itertools.chain to simply flatten the lists:

In [70]: from itertools import chain
In [71]: import pandas as pnd
In [72]: s = pnd.Series([['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
In [73]: s
Out[73]: 
0    [slim, waist, man]
1     [slim, waistline]
2               [santa]
dtype: object
In [74]: new_s = pnd.Series(list(chain(*s.values)))
In [75]: new_s
Out[75]: 
0         slim
1        waist
2          man
3         slim
4    waistline
5        santa
dtype: object

You can use the list concatenation operator like below -

lst1 = ['hello','world']
lst2 = ['bye','world']
newlst = lst1 + lst2
print(newlst)
>> ['hello','world','bye','world']

Or you can use list.extend() function as below -

lst1 = ['hello','world']
lst2 = ['bye','world']
lst1.extend(lst2)
print(lst1)
>> ['hello', 'world', 'bye', 'world']

Benefits of using extend function is that it can work on multiple types, where as concatenation operator will only work if both LHS and RHS are lists.

Other examples of extend function -

lst1.extend(('Bye','Bye'))
>> ['hello', 'world', 'Bye', 'Bye']

Flattening and unflattening can be done using this function

def flatten(df, col):
    col_flat = pd.DataFrame([[i, x] for i, y in df[col].apply(list).iteritems() for x in y], columns=['I', col])
    col_flat = col_flat.set_index('I')
    df = df.drop(col, 1)
    df = df.merge(col_flat, left_index=True, right_index=True)

    return df

Unflattening:

def unflatten(flat_df, col):
    flat_df.groupby(level=0).agg({**{c:'first' for c in flat_df.columns}, col: list})

After unflattening we get the same dataframe except column order:

(df.sort_index(axis=1) == unflatten(flatten(df)).sort_index(axis=1)).all().all()
>> True

In pandas version 0.25.0 appeared a new method 'explode' for series and dataframes. Older versions do not have such method.

It helps to build the result you need.

For example you have such series:

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']])

Then you can use

s.explode()

To get such result:

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa

In case of dataframe:

df = pd.DataFrame({
  's': pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']
   ]),
   'a': 1
})

You will have such DataFrame:

                    s  a
0  [slim, waist, man]  1
1   [slim, waistline]  1
2             [santa]  1

Applying explode on s column:

df.explode('s')

Will give you such result:

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!