Return multiple columns from pandas apply()

后端未结

关注

 9  2123

I have a pandas DataFrame, df_test. It contains a column \'size\' which represents size in bytes. I\'ve calculated KB, MB, and GB using the following code:

相关标签:

9条回答

忘掉有多难

2020-11-30 19:35
Generally, to return multiple values, this is what I do
```
def gimmeMultiple(group):
    x1 = 1
    x2 = 2
    return array([[1, 2]])
def gimmeMultipleDf(group):
    x1 = 1
    x2 = 2
    return pd.DataFrame(array([[1,2]]), columns=['x1', 'x2'])
df['size'].astype(int).apply(gimmeMultiple)
df['size'].astype(int).apply(gimmeMultipleDf)
```
Returning a dataframe definitively has its perks, but sometimes not required. You can look at what the apply() returns and play a bit with the functions ;)
0 讨论(0)
发布评论:

提交评论
- 加载中...

心在旅途

2020-11-30 19:36

I believe the 1.1 version breaks the behavior suggested in the top answer here.

import pandas as pd
def test_func(row):
    row['c'] = str(row['a']) + str(row['b'])
    row['d'] = row['a'] + 1
    return row

df = pd.DataFrame({'a': [1, 2, 3], 'b': ['i', 'j', 'k']})
df.apply(test_func, axis=1)

The above code ran on pandas 1.1.0 returns:

   a  b   c  d
0  1  i  1i  2
1  1  i  1i  2
2  1  i  1i  2

While in pandas 1.0.5 it returned:

   a   b    c  d
0  1   i   1i  2
1  2   j   2j  3
2  3   k   3k  4

Which I think is what you'd expect.

Not sure how the release notes explain this behavior, however as explained here avoiding mutation of the original rows by copying them resurrects the old behavior. i.e.:

def test_func(row):
    row = row.copy()   #  <---- Avoid mutating the original reference
    row['c'] = str(row['a']) + str(row['b'])
    row['d'] = row['a'] + 1
    return row

0 讨论(0)

小鲜肉

2020-11-30 19:37
This is an old question, but for completeness, you can return a Series from the applied function that contains the new data, preventing the need to iterate three times. Passing axis=1 to the apply function applies the function sizes to each row of the dataframe, returning a series to add to a new dataframe. This series, s, contains the new values, as well as the original data.
```
def sizes(s):
    s['size_kb'] = locale.format("%.1f", s['size'] / 1024.0, grouping=True) + ' KB'
    s['size_mb'] = locale.format("%.1f", s['size'] / 1024.0 ** 2, grouping=True) + ' MB'
    s['size_gb'] = locale.format("%.1f", s['size'] / 1024.0 ** 3, grouping=True) + ' GB'
    return s

df_test = df_test.append(rows_list)
df_test = df_test.apply(sizes, axis=1)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2