variable column name in dask assign() or apply()

白昼怎懂夜的黑 提交于 2019-12-23 15:44:44

问题


I have code that works in pandas, but I'm having trouble converting it to use dask. There is a partial solution here, but it does not allow me to use a variable as the name of the column I am creating/assigning to.

Here's the working pandas code:

percent_cols = ['num_unique_words', 'num_words_over_6']

def find_fraction(row, col):
    return row[col] / row['num_words']

for c in percent_cols:
    df[c] = df.apply(find_fraction, col=c, axis=1)

Here's the dask code that doesn't do what I want:

data = dd.from_pandas(df, npartitions=8)

for c in percent_cols:
    data = data.assign(c = data[c] / data.num_words)

This assigns the result to a new column called c rather than modifying the value of data[c] (what I want). Creating a new column would be fine if I could have the column name be a variable. E.g., if this worked:

for c in percent_cols:
    name = c + "new"
    data = data.assign(name = data[c] / data.num_words)

For obvious reasons, python doesn't allow an expression left of an = and ignores the previous value of name.

How can I use a variable for the name of the column I am assigning to? The loop iterates far more times than I'm willing to copy/paste.


回答1:


This can be interpreted as a Python language question:

Question: How do I use a variable's value as the name in a keyword argument?

Answer: Use a dictionary and ** unpacking

c = 'name'
f(c=5)       # 'c' is used as the keyword argument name, not what we want
f(**{c: 5})  # 'name' is used as the keyword argument name, this is great

Dask.dataframe solution

For your particular question I recommend the following:

d = {col: df[col] / df['num_words'] for col in percent_cols}
df = df.assign(**d)

Consider doing this with Pandas as well

The .assign method is available in Pandas as well and may be faster than using .apply.



来源:https://stackoverflow.com/questions/33557022/variable-column-name-in-dask-assign-or-apply

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!