Pandas read_csv expects wrong number of columns, with ragged csv file

前端 未结 4 1875
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-08 14:34

I have a csv file that has a few hundred rows and 26 columns, but the last few columns only have a value in a few rows and they are towards the middle or end of the file. Wh

4条回答
  •  盖世英雄少女心
    2020-12-08 15:21

    Suppose you have a file like this:

    a,b,c
    1,2,3
    1,2,3,4
    

    You could use csv.reader to clean the file first,

    lines=list(csv.reader(open('file.csv')))    
    header, values = lines[0], lines[1:]    
    data = {h:v for h,v in zip (header, zip(*values))}
    

    and get:

    {'a' : ('1','1'), 'b': ('2','2'), 'c': ('3', '3')}
    

    If you don't have header you could use:

    data = {h:v for h,v in zip (str(xrange(number_of_columns)), zip(*values))}
    

    and then you can convert dictionary to dataframe with

    import pandas as pd
    df = pd.DataFrame.from_dict(data)
    

提交回复
热议问题