Iterate over pandas dataframe columns containing nested arrays

前端 未结 4 1733
情话喂你
情话喂你 2021-01-21 01:46

I hope you can help me with this issue,

I\'ve this data below (Columns names whatever)

data=([[\'file0090\',
    ([[ 84,  55, 189],
   [248, 100,  18],
         


        
相关标签:
4条回答
  • 2021-01-21 02:07

    You can try this:-

    data_f = [[i[0]]+j for i in data for j in i[1]]
    df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])
    

    Output:-

    col0          col1  col2   col3 
    file0090      84     55     189
    file0090      248    100      1
    file0090      68     115    88
    file6565      86     58    189
    file6565      24    10     118
    file6565      68    11      8
    
    0 讨论(0)
  • 2021-01-21 02:12

    We can do explode with row the do it explode with column again

    s = pd.DataFrame(data).set_index(0)[1].explode()
    df = pd.DataFrame(s.tolist(), index = s.index.values)
    
    df
    Out[396]: 
                0    1    2
    file0090   84   55  189
    file0090  248  100   18
    file0090   68  115   88
    file6565   86   58  189
    file6565   24   10  118
    file6565   68   11    8
    
    0 讨论(0)
  • 2021-01-21 02:14

    You can create a custom function to output the correct form of data.

    from itertools import chain
    def transform(d):
        for l in d:
            *x, y = l
            yield list(map(lambda s: x+s, y))
    
    df = pd.DataFrame(chain(*transform(data)))
    df
              0    1    2    3
    0  file0090   84   55  189
    1  file0090  248  100   18
    2  file0090   68  115   88
    3  file6565   86   58  189
    4  file6565   24   10  118
    5  file6565   68   11    8
    

    Timeit results of all the solutions:

    # YOBEN_S's answer
    In [275]: %%timeit
         ...: s = pd.DataFrame(data).set_index(0)[1].explode()
         ...: df = pd.DataFrame(s.tolist(), index = s.index.values)
         ...:
         ...:
    1.52 ms ± 59.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    #Anky's answer
    In [276]: %%timeit
         ...: df = pd.DataFrame(data).add_prefix('col')
         ...: out = df.explode('col1').reset_index(drop=True)
         ...: out = out.join(pd.DataFrame(out.pop('col1').tolist()).add_prefix('col_'))
         ...:
         ...:
    3.71 ms ± 606 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    #Dhaval's answer
    In [277]: %%timeit
         ...: data_f = []
         ...: for i in data:
         ...:     for j in i[1]:
         ...:         data_f.append([i[0]]+j)
         ...: df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])
         ...:
         ...:
    712 µs ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    #My answer
    In [280]: %%timeit
         ...: pd.DataFrame(chain(*transform(data)))
         ...:
         ...:
    489 µs ± 8.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    #Using List comp of Dhaval's answer
    
    In [306]: %%timeit
         ...: data_f = [[i[0]]+j for i in data for j in i[1]]
         ...: df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])
         ...:
         ...:
    586 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    #Anky's 2nd solution
    
    In [308]: %%timeit
         ...: l = [*chain.from_iterable(data)]
         ...: pd.DataFrame(np.vstack(l[1::2]),index = np.repeat(l[::2],len(l[1])))
         ...:
         ...:
    221 µs ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    0 讨论(0)
  • 2021-01-21 02:20

    You can do explode with a join after crreating another df from the series of lists:

    df = pd.DataFrame(data).add_prefix('col')
    
    out = df.explode('col1').reset_index(drop=True)
    out = out.join(pd.DataFrame(out.pop('col1').tolist()).add_prefix('col_'))
    

    Adding another solution if the list structure is similar:

    l = [*itertools.chain.from_iterable(data)]
    pd.DataFrame(np.vstack(l[1::2]),index = np.repeat(l[::2],len(l[1])))
    

          col0  col_0  col_1  col_2
    0  file0090     84     55    189
    1  file0090    248    100     18
    2  file0090     68    115     88
    3  file6565     86     58    189
    4  file6565     24     10    118
    5  file6565     68     11      8
    
    0 讨论(0)
提交回复
热议问题