问题
I have a Pandas Series of lists of strings:
0 [slim, waist, man]
1 [slim, waistline]
2 [santa]
As you can see, the lists vary by length. I want an efficient way to collapse this into one series
0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
I know I can break up the lists using
series_name.split(' ')
But I am having a hard time putting those strings back into one list.
Thanks!
回答1:
You are basically just trying to flatten a nested list here.
You should just be able to iterate over the elements of the series:
slist =[]
for x in series:
slist.extend(x)
or a slicker (but harder to understand) list comprehension:
slist = [st for row in s for st in row]
回答2:
Here's a simple method using only pandas functions:
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']])
Then
s.apply(pd.Series).stack().reset_index(drop=True)
gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.
0 0 slim
1 waist
2 man
1 0 slim
1 waistline
2 0 santa
If this is what you want, just omit .reset_index(drop=True)
from the chain.
回答3:
series_name.sum()
does exactly what you need. Do make sure it's a series of lists otherwise your values will be concatenated (if string) or added (if int)
回答4:
You can try using itertools.chain to simply flatten the lists:
In [70]: from itertools import chain
In [71]: import pandas as pnd
In [72]: s = pnd.Series([['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
In [73]: s
Out[73]:
0 [slim, waist, man]
1 [slim, waistline]
2 [santa]
dtype: object
In [74]: new_s = pnd.Series(list(chain(*s.values)))
In [75]: new_s
Out[75]:
0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
dtype: object
回答5:
In pandas version 0.25.0
appeared a new method 'explode' for series and dataframes. Older versions do not have such method.
It helps to build the result you need.
For example you have such series:
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']])
Then you can use
s.explode()
To get such result:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
In case of dataframe:
df = pd.DataFrame({
's': pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']
]),
'a': 1
})
You will have such DataFrame:
s a
0 [slim, waist, man] 1
1 [slim, waistline] 1
2 [santa] 1
Applying explode on s
column:
df.explode('s')
Will give you such result:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1
回答6:
You can use the list concatenation operator like below -
lst1 = ['hello','world']
lst2 = ['bye','world']
newlst = lst1 + lst2
print(newlst)
>> ['hello','world','bye','world']
Or you can use list.extend()
function as below -
lst1 = ['hello','world']
lst2 = ['bye','world']
lst1.extend(lst2)
print(lst1)
>> ['hello', 'world', 'bye', 'world']
Benefits of using extend
function is that it can work on multiple types, where as concatenation
operator will only work if both LHS and RHS are lists.
Other examples of extend
function -
lst1.extend(('Bye','Bye'))
>> ['hello', 'world', 'Bye', 'Bye']
回答7:
Flattening and unflattening can be done using this function
def flatten(df, col):
col_flat = pd.DataFrame([[i, x] for i, y in df[col].apply(list).iteritems() for x in y], columns=['I', col])
col_flat = col_flat.set_index('I')
df = df.drop(col, 1)
df = df.merge(col_flat, left_index=True, right_index=True)
return df
Unflattening:
def unflatten(flat_df, col):
flat_df.groupby(level=0).agg({**{c:'first' for c in flat_df.columns}, col: list})
After unflattening we get the same dataframe except column order:
(df.sort_index(axis=1) == unflatten(flatten(df)).sort_index(axis=1)).all().all()
>> True
回答8:
You may also try:
combined = []
for i in s.index:
combined = combined + s.iloc[i]
print(combined)
s = pd.Series(combined)
print(s)
output:
['slim', 'waist', 'man', 'slim', 'waistline', 'santa']
0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
dtype: object
来源:https://stackoverflow.com/questions/30885005/pandas-series-of-lists-to-one-series