Pandas Series of lists to one series

后端 未结 9 2186
野性不改
野性不改 2020-12-28 13:18

I have a Pandas Series of lists of strings:

0                           [slim, waist, man]
1                                [slim, waistline]
2                       


        
相关标签:
9条回答
  • 2020-12-28 13:42
    series_name.sum()
    

    does exactly what you need. Do make sure it's a series of lists otherwise your values will be concatenated (if string) or added (if int)

    0 讨论(0)
  • 2020-12-28 13:46

    You may also try:

    combined = []
    for i in s.index:
        combined = combined + s.iloc[i]
    
    print(combined)
    
    s = pd.Series(combined)
    print(s)
    

    output:

    ['slim', 'waist', 'man', 'slim', 'waistline', 'santa']
    
    0         slim
    1        waist
    2          man
    3         slim
    4    waistline
    5        santa
    
    dtype: object
    
    0 讨论(0)
  • 2020-12-28 13:57

    In pandas version 0.25.0 appeared a new method 'explode' for series and dataframes. Older versions do not have such method.

    It helps to build the result you need.

    For example you have such series:

    import pandas as pd
    
    s = pd.Series([
        ['slim', 'waist', 'man'],
        ['slim', 'waistline'],
        ['santa']])
    

    Then you can use

    s.explode()
    

    To get such result:

    0         slim
    0        waist
    0          man
    1         slim
    1    waistline
    2        santa
    

    In case of dataframe:

    df = pd.DataFrame({
      's': pd.Series([
        ['slim', 'waist', 'man'],
        ['slim', 'waistline'],
        ['santa']
       ]),
       'a': 1
    })
    

    You will have such DataFrame:

                        s  a
    0  [slim, waist, man]  1
    1   [slim, waistline]  1
    2             [santa]  1
    

    Applying explode on s column:

    df.explode('s')
    

    Will give you such result:

               s  a
    0       slim  1
    0      waist  1
    0        man  1
    1       slim  1
    1  waistline  1
    2      santa  1
    

    If your series, contain empty lists

    import pandas as pd
    
    s = pd.Series([
        ['slim', 'waist', 'man'],
        ['slim', 'waistline'],
        ['santa'],
        []
    ])
    

    Then running explode will introduce NaN values for empty lists, like this:

    0         slim
    0        waist
    0          man
    1         slim
    1    waistline
    2        santa
    3          NaN
    

    If this is not desired, you can dropna method call:

    s.explode().dropna()
    

    To get this result:

    0         slim
    0        waist
    0          man
    1         slim
    1    waistline
    2        santa
    

    Dataframes also have dropna method:

    df = pd.DataFrame({
      's': pd.Series([
        ['slim', 'waist', 'man'],
        ['slim', 'waistline'],
        ['santa'],
        []
       ]),
       'a': 1
    })
    

    Running explode without dropna:

    df.explode('s')
    

    Will result into:

               s  a
    0       slim  1
    0      waist  1
    0        man  1
    1       slim  1
    1  waistline  1
    2      santa  1
    3        NaN  1
    

    with dropna:

    df.explode('s').dropna(subset=['s'])
    

    Result:

               s  a
    0       slim  1
    0      waist  1
    0        man  1
    1       slim  1
    1  waistline  1
    2      santa  1
    
    0 讨论(0)
提交回复
热议问题