accessing the column in pandas in different way

拜拜、爱过 提交于 2019-12-13 22:38:41

问题


I had a data set that looks like:

    Id  Economics      English    History  Literature  
0  56          1            1          2        1                     
1  11          1            0          0        1                    
2   6          0            1          1        0                     
3  43          2            0          1        1                     
4  14          0            1          1        0   

I created this dataset by reading some csv from file, I could very easily accessed the columns just with df['Economics'], for example. Then I save it into the file with:

df.to_csv(file_path, sep='\t')

But when I reopen the dataset in other function for work i other purposes, and tried to access the columns in the same way, i.e.

df=pd.read_csv(file_path, sep='\t')
print df['Economics']

I've got

KeyError: Economics

I tried multiple encoding while reading, and also verified if it's not a multi-index dataframe, but everything was OK with encoding and index. I found out that there are another method: df.get('Economocs'), that, in this case worked without error. But, then, if I wanted iterated over the columns name, looking for 'Economics', again,I had an KeyError.

So my question: Why it happens? why sometimes I can access column directly with df['column_name'] and sometimes I need to use df.get('column_name'). And how to deal with column.names, in the case if the first method doesn't work?


回答1:


It looks like there is some unwanted character in the column name. Maybe is something like 'Economics ' or something else.

df.get('Economics') in that case would not give KeyError, instead it would just return nothing.

Try checking the output of df.columns and the length of the column name with len(df.columns[1]) .




回答2:


I guess you either have trailing spaces in all/some of your column names or even have just one column like in my test example below:

Test data:

Id  Economics     English   History   Literature  
56  1   1   2   1
11  1   0   0   1
6   1   1   0   0
43  2   0   1   1
14  1   1   1   0

Test code:

import pandas as pd

df = pd.read_csv('test.csv', sep='\t')
print(df)
print(df.columns.tolist())

Output:

  Id  Economics     English   History   Literature
0                                  56  1   1   2   1
1                                  11  1   0   0   1
2                                  6   1   1   0   0
3                                  43  2   0   1   1
4                                  14  1   1   1   0
['Id  Economics     English   History   Literature  ']

DataFrame has only one column: 'Id Economics English History Literature '

Lets change sep='\t' to sep='\s+' in pd.read_csv() and execute our test code against the same data set:

   Id  Economics  English  History  Literature
0  56          1        1        2           1
1  11          1        0        0           1
2   6          1        1        0           0
3  43          2        0        1           1
4  14          1        1        1           0
['Id', 'Economics', 'English', 'History', 'Literature']


来源:https://stackoverflow.com/questions/35764172/accessing-the-column-in-pandas-in-different-way

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!