pandas read_csv and filter columns with usecols

后端 未结 5 1360
小鲜肉
小鲜肉 2020-11-28 03:00

I have a csv file which isn\'t coming in correctly with pandas.read_csv when I filter the columns with usecols and use multiple indexes.

5条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-28 03:17

    The answer by @chip completely misses the point of two keyword arguments.

    • names is only necessary when there is no header and you want to specify other arguments using column names rather than integer indices.
    • usecols is supposed to provide a filter before reading the whole DataFrame into memory; if used properly, there should never be a need to delete columns after reading.

    This solution corrects those oddities:

    import pandas as pd
    from StringIO import StringIO
    
    csv = r"""dummy,date,loc,x
    bar,20090101,a,1
    bar,20090102,a,3
    bar,20090103,a,5
    bar,20090101,b,1
    bar,20090102,b,3
    bar,20090103,b,5"""
    
    df = pd.read_csv(StringIO(csv),
            header=0,
            index_col=["date", "loc"], 
            usecols=["date", "loc", "x"],
            parse_dates=["date"])
    

    Which gives us:

                    x
    date       loc
    2009-01-01 a    1
    2009-01-02 a    3
    2009-01-03 a    5
    2009-01-01 b    1
    2009-01-02 b    3
    2009-01-03 b    5
    

提交回复
热议问题