read_csv with missing/incomplete header or irregular number of columns

前端 未结 4 1905
小鲜肉
小鲜肉 2021-01-18 08:12

I have a file.csv with ~15k rows that looks like this

SAMPLE_TIME,          POS,        OFF,  HISTOGRAM
2015-07-15 16:41:56,  0-0-0-0-3,   1,           


        
4条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-18 08:47

    So how about this. I made a csv from your sample data.

    When I import lines:

    with open('test.csv','rb') as f:
        lines = list(csv.reader(f))
    headers, values =lines[0],lines[1:]
    

    to generate nice header names, use this line:

    headers = [i or ind for ind, i in enumerate(headers)]
    

    so because of how (I assume) csv works, headers should have a bunch of empty string values. empty strings evaluate to False, so this comprehension returns numbered columns for each column without a header.

    Then just make a df:

    df = pd.DataFrame(values,columns=headers)
    

    which looks like:

    11:         SAMPLE_TIME           POS         OFF   HISTOGRAM  4  5   6  7  8  9  \
    0  15/07/2015 16:41     0-0-0-0-3           1           2  0  5  59  0  0  0   
    1  15/07/2015 16:42     0-0-0-0-3           1           0  0  5   9  0  0  0   
    2  15/07/2015 16:43     0-0-0-0-3           1           0  0  5   5  0  0  0   
    3  15/07/2015 16:44     0-0-0-0-3           1           2  0  5   0  0  0  0   
    
      ... 12 13 14 15  16 17 18 19 20 21  
    0 ...  2  0  0  0   0  0  0  0  0  0  
    1 ...  2  0  0  0  50  0              
    2 ...  2  0  0  0   0  4  0  0  0     
    3 ...  2  0  0  0   6  0  0  0  0     
    
    [4 rows x 22 columns]
    

提交回复
热议问题