dfply: Mutating string column: TypeError

[亡魂溺海] 提交于 2019-12-23 12:37:25

问题


My pandas dataframe contains a column "file" which are strings with a file path. I am trying to use dfply to mutate this column like

resultstatsDF.reset_index() >> mutate(dirfile = os.path.join(os.path.basename(os.path.dirname(X.file)),os.path.basename(X.file)))

but I get the error

TypeError: __index__ returned non-int (type Call)

What did I do wrong? How do I do it right?


回答1:


Since my question was up-voted, I guess, it is still interesting to some people. Having learned quite a bit in Python so far, let me answer it, maybe it is going to be helpful to other users.

First, let us import the required packages

import pandas as pd
from dfply import *
from os.path import basename, dirname, join

and make the required pandas DataFrame

resultstatsDF = pd.DataFrame({'file': ['/home/user/this/file1.png', '/home/user/that/file2.png']})

which is

                        file
0  /home/user/this/file1.png
1  /home/user/that/file2.png

We see that we still get an error (though it changed due to continuous development of dfply):

resultstatsDF.reset_index() >> \
mutate(dirfile = join(basename(dirname(X.file)), basename(X.file)))

TypeError: index returned non-int (type Intention)

The reason is, because mutate works on series, but we need a function working on elements. Here we can use the function pandas.Series.apply of pandas, which works on series. However, we also need a custom function that we can apply on each element of the series file. Everything put together we end up with the code

def extract_last_dir_plus_filename(series_element):
    return join(basename(dirname(series_element)), basename(series_element))

resultstatsDF.reset_index() >> \
mutate(dirfile = X.file.apply(extract_last_dir_plus_filename))

which outputs

   index                       file         dirfile
0      0  /home/user/this/file1.png  this/file1.png
1      1  /home/user/that/file2.png  that/file2.png

Doing this without dfply's mutate, we could write alternatively

resultstatsDF['dirfile'] = resultstatsDF.file.apply(extract_last_dir_plus_filename)


来源:https://stackoverflow.com/questions/42671168/dfply-mutating-string-column-typeerror

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!