Dataframe Apply method to return multiple elements (series)

匿名 (未验证) 提交于 2019-12-03 02:33:02

问题:

import pandas as pd 

Let's say I have a dataframe like so:

df = pd.DataFrame({"a":range(4),"b":range(1,5)}) 

it looks like this:

   a  b 0  0  1 1  1  2 2  2  3 3  3  4 

and a function that multiplies X by Y:

def XtimesY(x,y):     return x*y 

If I want to add a new pandas series to df I can do:

df["c"] =df.apply( lambda x:XtimesY(x["a"],2), axis =1) 

It works !

Now I want to add multiple series:

I have this function:

def divideAndMultiply(x,y):     return x/y, x*y 

something like this ?:

df["e"], df["f"] = df.apply( lambda x: divideAndMultiply(x["a"],2) , axis =1) 

It doesn't work !

I want the 'e' column to receive the divisions and 'f' column the multiplications !

Note: This is not the code I'm using but I'm expecting the same behavior.

回答1:

Redefine your function like this:

def divideAndMultiply(x,y):     return [x/y, x*y] 

Then do this:

df[['e','f']] = df.apply( lambda x: divideAndMultiply(x["a"],2) , axis =1) 

You shall get the desired result:

In [118]: df Out[118]:    a  b  e  f 0  0  1  0  0 1  1  2  0  2 2  2  3  1  4 3  3  4  1  6 


回答2:

Almost there. Use zip* to unpack the function. Try this:

def divideAndMultiply(x,y):     return x/y, x*y  df["e"], df["f"] = zip(*df.a.apply(lambda val: divideAndMultiply(val,2))) 


回答3:

This doesn't work when you do it twice:

df = pd.DataFrame({"a":range(4),"b":range(1,5)})  print(df)  def foo(x,y):     return [x/y, x*y]  df[['e','f']] = df.apply( lambda x: foo(x["a"],2) , axis =1) print(df) df[['g','h']] = df.apply( lambda x: foo(x["a"],2) , axis =1) print(df) 

yields:

       a  b     0  0  1     1  1  2     2  2  3     3  3  4        a  b    e    f     0  0  1  0.0  0.0     1  1  2  0.5  2.0     2  2  3  1.0  4.0     3  3  4  1.5  6.0      --------------------------------------------------------------------------- KeyError                                  Traceback (most recent call last) <ipython-input-65-edf8718c90ec> in <module>()       8 df[['e','f']] = df.apply( lambda x: foo(x["a"],2) , axis =1)       9 print(df) ---> 10 df[['g','h']] = df.apply( lambda x: foo(x["a"],2) , axis =1)      11 print(df)      12   E:\dev\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)    2324     2325         if isinstance(key, (Series, np.ndarray, list, Index)): -> 2326             self._setitem_array(key, value)    2327         elif isinstance(key, DataFrame):    2328             self._setitem_frame(key, value)  E:\dev\Anaconda3\lib\site-packages\pandas\core\frame.py in _setitem_array(self, key, value)    2352                     self[k1] = value[k2]    2353             else: -> 2354                 indexer = self.loc._convert_to_indexer(key, axis=1)    2355                 self._check_setitem_copy()    2356                 self.loc._setitem_with_indexer((slice(None), indexer), value)  E:\dev\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)    1229                 mask = check == -1    1230                 if mask.any(): -> 1231                     raise KeyError('%s not in index' % objarr[mask])    1232     1233                 return _values_from_object(indexer)  KeyError: "['g' 'h'] not in index" 


回答4:

df["e"], df["f"] = zip(*df.apply( lambda x: divideAndMultiply(x["a"],2) , axis =1)) 

Should do the trick.

(I show this example so you can see how to use multiple columns as the input to create multiple new columns)



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!