I have a dataframe with multiple string columns. I want to use a string method that is valid for a series on multiple columns of the dataframe. Something like this is what I w
You can use a dictionary comprehension and feed to the pd.DataFrame constructor:
res = pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})
Currently, the Pandas str methods are inefficient. Regex is even more inefficient, but more easily extendible. As always, you should test with your data.
# Benchmarking on Python 3.6.0, Pandas 0.19.2
def jez1(df):
return df.apply(lambda x: x.str.rstrip('f'))
def jez2(df):
return df.applymap(lambda x: x.rstrip('f'))
def jpp(df):
return pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})
def user3483203(df):
return df.replace(r'f$', '', regex=True)
df = pd.concat([df]*10000)
%timeit jez1(df) # 33.1 ms per loop
%timeit jez2(df) # 29.9 ms per loop
%timeit jpp(df) # 13.2 ms per loop
%timeit user3483203(df) # 42.9 ms per loop