Pandas apply multiple columns per row instead of list

时光总嘲笑我的痴心妄想 提交于 2019-12-13 02:57:40

问题


I have trouble making pandas returning multiple columns when using apply.

Example:

import pandas as pd
import numpy as np
np.random.seed(1)

df = pd.DataFrame(index=range(2), columns=['a', 'b'])
df.loc[0] = [np.array((1,2,3))], 1
df.loc[1] = [np.array((4,5,6))], 1
df

             a  b
0  [[1, 2, 3]]  1
1  [[4, 5, 6]]  1

df2 = np.random.randint(1,9, size=(3,2))
df2

array([[4, 6],
       [8, 1],
       [1, 2]])

def example(x):
    return np.transpose(df2) @ x[0]

df3 = df['a'].apply(example)
df3

0    [23, 14]
1    [62, 41]

I want df3 to have two columns with one element in each per column per row, not one column with both elements per row.

So I want somthing like

df3Wanted
         col1  col2
    0    23    14
    1    62    41

Does anybody know how to fix this?


回答1:


Couple of changes are required to achieve this:

Update below function as below

def example(x):
    return [np.transpose(df2) @ x[0]]

and perform below operation on df3

wantedDF3 = pd.concat(df3.apply(pd.DataFrame, columns=['col1','col2']).tolist())

print(wantedDF3) gives desired output:

 col1  col2
0    40    12
0    97    33

Edit: Another way to do the same thing, to avoid memory error issues: Keep your example function and df3 as it is (same as question) Now, just on top of that, use below code to generate wantedDF3

col1df = pd.DataFrame(df3.apply(lambda x: x[0]).values, columns=['col1'])
col2df = pd.DataFrame(df3.apply(lambda x: x[1]).values,  columns=['col2'])
wantedDF3 = col1df.join(col2df)



回答2:


This is an answer to the comments of the first answer and concerns the issue of memory error. The following example uses data that gives memory error on my computer with all methods suggested so far (the first answer and the comments in the 1st answer), but it works with the code below:

import pandas as pd
import numpy as np
import time
np.random.seed(1)

nRows = 25000
nCols = 10000
numberOfChunks = 5

df = pd.DataFrame(index=range(nRows ), columns=range(1))

df2 = df.apply(lambda row: np.random.rand(nCols), axis=1)

for start, stop in zip(np.arange(0, nRows , int(round(nRows/float(numberOfChunks)))), 
                       np.arange(int(round(nRows/float(numberOfChunks))), nRows +  int(round(nRows/float(numberOfChunks))), int(round(nRows/float(numberOfChunks))))):
    df2tmp = df2.iloc[start:stop]
    if start == 0:
        df3 = pd.DataFrame(df2tmp.tolist(), index=df2tmp.index).astype('float16')
        continue
    df3tmp =  pd.DataFrame(df2tmp.tolist(), index=df2tmp.index).astype('float16')
    df3 = pd.concat([df3, df3tmp])


来源:https://stackoverflow.com/questions/58392974/pandas-apply-multiple-columns-per-row-instead-of-list

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!