How to read index data as string with pandas.read_csv()?

前端未结

关注

 2  1880

离开以前 2020-12-19 06:37

I\'m trying to read csv file as DataFrame with pandas, and I want to read index row as string. However, since the row for index doesn\'t have any characters, pandas handles

2条回答

梦毁少年i (楼主)

2020-12-19 07:01

pass dtype param to specify the dtype:

In [159]:
import pandas as pd
import io
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
df = pd.read_csv(io.StringIO(t), dtype={'uid':str})
df.set_index('uid', inplace=True)
df.index

Out[159]:
Index(['01', '02', '03'], dtype='object', name='uid')

So in your case the following should work:

df = pd.read_csv('sample.csv', dtype={'uid':str})
df.set_index('uid', inplace=True)

The one-line equivalent doesn't work, due to a still-outstanding pandas bug here where the dtype param is ignored on cols that are to be treated as the index**:

df = pd.read_csv('sample.csv', dtype={'uid':str}, index_col='uid')

You can dynamically do this if we assume the first column is the index column:

In [171]:
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str
df = pd.read_csv(io.StringIO(t), dtype=dtypes)
df.set_index('uid', inplace=True)
df.info()


Index: 3 entries, 01 to 03
Data columns (total 3 columns):
f1    3 non-null float64
f2    3 non-null float64
f3    3 non-null float64
dtypes: float64(3)
memory usage: 96.0+ bytes

In [172]:
df.index

Out[172]:
Index(['01', '02', '03'], dtype='object', name='uid')

Here we read just the header row to get the column names:

cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()

we then generate dict of the column names with the desired dtypes:

index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str

we get the index name, assuming it's the first entry and then create a dict from the rest of the cols and assign float as the desired dtype and add the index col specifying the type to be str, you can then pass this as the dtype param to read_csv

0 讨论(0)

查看其它2个回答