how to delete a duplicate column read from excel in pandas

Data in excel:

a   b   a   d
1   2   3   4
2   3   4   5
3   4   5   6
4   5   6   7

Code:

df= pd.io.excel.read_excel(r"sample.xlsx",sheetname="Sheet1")
df
   a  b  a.1  d
0  1  2    3  4
1  2  3    4  5
2  3  4    5  6
3  4  5    6  7

how to delete the column a.1?

when pandas reads the data from excel it automatically changes the column name of 2nd a to a.1.

I tried df.drop("a.1",index=1) , this does not work.

I have a huge excel file which has duplicate names, and i am interested only in few of columns.

If you know the name of the column you want to drop:

df = df[[col for col in df.columns if col != 'a.1']]

and if you have several columns you want to drop:

columns_to_drop = ['a.1', 'b.1', ... ]
df = df[[col for col in df.columns if col not in columns_to_drop]]

You need to pass axis=1 for drop to work:

In [100]:
df.drop('a.1', axis=1)

Out[100]:
   a  b  d
0  1  2  4
1  2  3  5
2  3  4  6
3  4  5  7

Or just pass a list of the cols of interest for column selection:

In [102]:
cols = ['a','b','d']
df[cols]

Out[102]:
   a  b  d
0  1  2  4
1  2  3  5
2  3  4  6
3  4  5  7

Also works with 'fancy indexing':

In [103]:
df.ix[:,cols]

Out[103]:
   a  b  d
0  1  2  4
1  2  3  5
2  3  4  6
3  4  5  7

来源：https://stackoverflow.com/questions/30528840/how-to-delete-a-duplicate-column-read-from-excel-in-pandas

标签

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!