pandas read_csv and filter columns with usecols

后端 未结 5 1364
小鲜肉
小鲜肉 2020-11-28 03:00

I have a csv file which isn\'t coming in correctly with pandas.read_csv when I filter the columns with usecols and use multiple indexes.

5条回答
  •  旧时难觅i
    2020-11-28 03:11

    This code achieves what you want --- also its weird and certainly buggy:

    I observed that it works when:

    a) you specify the index_col rel. to the number of columns you really use -- so its three columns in this example, not four (you drop dummy and start counting from then onwards)

    b) same for parse_dates

    c) not so for usecols ;) for obvious reasons

    d) here I adapted the names to mirror this behaviour

    import pandas as pd
    from StringIO import StringIO
    
    csv = """dummy,date,loc,x
    bar,20090101,a,1
    bar,20090102,a,3
    bar,20090103,a,5
    bar,20090101,b,1
    bar,20090102,b,3
    bar,20090103,b,5
    """
    
    df = pd.read_csv(StringIO(csv),
            index_col=[0,1],
            usecols=[1,2,3], 
            parse_dates=[0],
            header=0,
            names=["date", "loc", "", "x"])
    
    print df
    

    which prints

                    x
    date       loc   
    2009-01-01 a    1
    2009-01-02 a    3
    2009-01-03 a    5
    2009-01-01 b    1
    2009-01-02 b    3
    2009-01-03 b    5
    

提交回复
热议问题