pandas dataframe column name: remove special character

孤者浪人 提交于 2020-06-25 09:23:59

问题


Some joker made a Lotus database/applet thingy for tracking engineering issues in our company. The joke is that the key piece of information was named with a special character... a number sign (hash tag, pound sign, \u0023).

abbreviated sample:

KA#         Issue Date      Current Position
27144       1/9/2014        Accounting
27194       12/20/2012      Engineering
32474       4/21/2008       Engineering
32623-HOLD  4/25/2016       Engineering
32745       11/13/2012      SEPE
32812       10/30/2013      Engineering
32817       12/7/2012       Purchasing
32839       1/8/2013        SEPE

I output this table (4K rows, 15 columns) to a csv file and process in python3 as a pandas dataframe.

I generate various outputs. If I use something like:

df.iloc[:,[0,3,1,8,9,10]]

I get appropriate output and the key column shows up as "KA#". (When I say "key column", I mean "most important"... NOT "index". I keep a serial index)

Unfortunately, people sometimes mess with the column order in Lotus between my exports to csv so I can not guarantee that "KA#" will be any particular column number. I would like to use column names:

df.loc[:,["KA#","Issue Date","Current Position"]]

But the "KA#" column is filled with NaN's.

Thanks for any help you can offer.

Finally, if I try to rename "KA#" to simply "KA":

df['KA#'].name = 'KA'

throws a KeyError and

df = df.rename(columns={"KA#": "ka"})

is completely ignored. The column shows up as "KA#".

Can anyone think of a way to get rid of or handle that symbol? I'd even settle for a regex at this point.


回答1:


use str.replace:
df.columns=df.columns.str.replace('#','')

You can check this in the documentation.



来源:https://stackoverflow.com/questions/37952797/pandas-dataframe-column-name-remove-special-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!