How to remove parentheses and all data within using Pandas/Python?

时光毁灭记忆、已成空白 提交于 2019-11-26 10:46:14

问题


I have a dataframe where I want to remove all parentheses and stuff inside it.

I checked out : How can I remove text within parentheses with a regex?

Where the answer to remove the data was

re.sub(r\'\\([^)]*\\)\', \'\', filename)

I tried this as well as

re.sub(r\'\\(.*?\\)\', \'\', filename)

However, I got an error: expected a string or buffer

When I tried using the column df[\'Column Name\'] I got no item named \'Column Name\'

I checked the dataframe using df.head() and it showed up as a clean table with the column names as what I wanted them to be....however when I use the re expression to remove the (stuff) it isn\'t recognizing the column name that I have.

I normally use

df[\'name\'].str.replace(\" ()\",\"\") 

However, I want to remove the parentheses and what is inside....How can I do this using either regex or pandas?

Thanks!

Here is the solution I used...thanks for the help!

All[\'Manufacturer Standard Name\'] = All[\'Manufacturer Standard Name\'].str.replace(r\"\\(.*\\)\",\"\")

回答1:


df['name'].str.replace(r"\(.*\)","")

You can't run re functions directly on pandas objects. You have to loop them for each element inside the object. So Series.str.replace((r"\(.*\)", "") is just syntactic sugar for Series.apply(lambda x: re.sub(r"\(.*\)", "", x)).




回答2:


If you have multiple (...) substrings in the data you should consider using either

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*?\)","")

or

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\([^()]*\)","")

The difference is that .*? is slower and does not match line breaks, and [^()] matches any char but ( and ) and is quite efficient and matches line breaks. The first one will match (...(...) but the second will only match (...).

If you want to normalize all whitespace after removing these substrings, you may consider

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\s*\([^()]*\)","").str.strip()

The \s*\([^()]*\) regex will match 0+ whitespaces and then the string between parentheses and then str.stip() will get rid of any potential trailing whitespace.



来源:https://stackoverflow.com/questions/20894525/how-to-remove-parentheses-and-all-data-within-using-pandas-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!