pandas read_csv parse header as string type but i want integer

折月煮酒 提交于 2021-02-07 08:37:57

问题


for example, csv file is as below ,(1,2,3) is header!

1,2,3
0,0,0

I read csv file using pd.read_csv and print

import pandas as pd
df = pd.read_csv('./test.csv')
print(df[1])

it occur error key error:1

it seems like that read_csv parse header as string..

is there any way using integer type in dataframe column?


回答1:


I think more general is cast to columns names to integer by astype:

df = pd.read_csv('./test.csv')
df.columns = df.columns.astype(int)

Another way is first get only first column and use parameter names in read_csv:

import csv
with open("file.csv", "r") as f:
    reader = csv.reader(f)
    i = np.array(next(reader)).astype(int)

#another way
#i = pd.read_csv("file.csv", nrows=0).columns.astype(int)
print (i)
[1 2 3]

df = pd.read_csv("file.csv", names=i, skiprows=1)
print (df.columns)
Int64Index([1, 2, 3], dtype='int64')



回答2:


Skip the header column using skiprows=1 and header=None. This automatically loads in a dataframe with integer headers starting from 0 onwards.

df = pd.read_csv('test.csv', skiprows=1, header=None).rename(columns=lambda x: x + 1)

df    
   1  2  3
0  0  0  0

The rename call is optional, but if you want your headers to start from 1, you may keep it in.


If you have a MultiIndex, use set_levels to set just the 0th level to integer:

df.columns = df.columns.set_levels(
     df.columns.get_level_values(0).astype(int), level=0
)



回答3:


You can use set_axis in conjunction with a lambda and pd.Index.map

Consider a csv that looks like:

1,1,2,2
a,b,a,b
1,3,5,7
0,2,4,6

Read it like:

df = pd.read_csv('test.csv', header=[0, 1])
df

   1     2   
   a  b  a  b
0  1  3  5  7
1  0  2  4  6

You can pipeline the column setting with integers in the first level like:

df.set_axis(df.columns.map(lambda i: (int(i[0]), i[1])), axis=1, inplace=False)

   1     2   
   a  b  a  b
0  1  3  5  7
1  0  2  4  6



回答4:


is there any way using integer type in dataframe column?

I find this quite elegant:

df = pd.read_csv('test.csv').rename(columns=int)

Note that int here is the built-in function int().



来源:https://stackoverflow.com/questions/49229415/pandas-read-csv-parse-header-as-string-type-but-i-want-integer

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!