How to read index data as string with pandas.read_csv()?

前端 未结 2 1872
离开以前
离开以前 2020-12-19 06:37

I\'m trying to read csv file as DataFrame with pandas, and I want to read index row as string. However, since the row for index doesn\'t have any characters, pandas handles

相关标签:
2条回答
  • 2020-12-19 07:01

    pass dtype param to specify the dtype:

    In [159]:
    import pandas as pd
    import io
    t="""uid,f1,f2,f3
    01,0.1,1,10
    02,0.2,2,20
    03,0.3,3,30"""
    df = pd.read_csv(io.StringIO(t), dtype={'uid':str})
    df.set_index('uid', inplace=True)
    df.index
    
    Out[159]:
    Index(['01', '02', '03'], dtype='object', name='uid')
    

    So in your case the following should work:

    df = pd.read_csv('sample.csv', dtype={'uid':str})
    df.set_index('uid', inplace=True)
    

    The one-line equivalent doesn't work, due to a still-outstanding pandas bug here where the dtype param is ignored on cols that are to be treated as the index**:

    df = pd.read_csv('sample.csv', dtype={'uid':str}, index_col='uid')
    

    You can dynamically do this if we assume the first column is the index column:

    In [171]:
    t="""uid,f1,f2,f3
    01,0.1,1,10
    02,0.2,2,20
    03,0.3,3,30"""
    cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
    index_col_name = cols[0]
    dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
    dtypes[index_col_name] = str
    df = pd.read_csv(io.StringIO(t), dtype=dtypes)
    df.set_index('uid', inplace=True)
    df.info()
    
    <class 'pandas.core.frame.DataFrame'>
    Index: 3 entries, 01 to 03
    Data columns (total 3 columns):
    f1    3 non-null float64
    f2    3 non-null float64
    f3    3 non-null float64
    dtypes: float64(3)
    memory usage: 96.0+ bytes
    
    In [172]:
    df.index
    
    Out[172]:
    Index(['01', '02', '03'], dtype='object', name='uid')
    

    Here we read just the header row to get the column names:

    cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
    

    we then generate dict of the column names with the desired dtypes:

    index_col_name = cols[0]
    dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
    dtypes[index_col_name] = str
    

    we get the index name, assuming it's the first entry and then create a dict from the rest of the cols and assign float as the desired dtype and add the index col specifying the type to be str, you can then pass this as the dtype param to read_csv

    0 讨论(0)
  • 2020-12-19 07:05

    If the result is not a string you have to convert it to be a string. try:

    result = [str(i) for i in result]
    

    or in this case:

    print([str(i) for i in df.index.values])
    
    0 讨论(0)
提交回复
热议问题