read_csv with missing/incomplete header or irregular number of columns

前端 未结 4 1892
小鲜肉
小鲜肉 2021-01-18 08:12

I have a file.csv with ~15k rows that looks like this

SAMPLE_TIME,          POS,        OFF,  HISTOGRAM
2015-07-15 16:41:56,  0-0-0-0-3,   1,           


        
4条回答
  •  天涯浪人
    2021-01-18 08:47

    You can split column HISTOGRAM to new DataFrame and concat it to original.

    print df
             SAMPLE_TIME,        POS, OFF,  \
    0 2015-07-15 16:41:56  0-0-0-0-3,   1,   
    1 2015-07-15 16:42:55  0-0-0-0-3,   1,   
    2 2015-07-15 16:43:55  0-0-0-0-3,   1,   
    3 2015-07-15 16:44:56  0-0-0-0-3,   1,   
    
                                     HISTOGRAM  
    0  2,0,5,59,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,  
    1          0,0,5,9,0,0,0,0,0,2,0,0,0,50,0,  
    2     0,0,5,5,0,0,0,0,0,2,0,0,0,0,4,0,0,0,  
    3      2,0,5,0,0,0,0,0,0,2,0,0,0,6,0,0,0,0  
    
    #create new dataframe from column HISTOGRAM
    h = pd.DataFrame([ x.split(',') for x in df['HISTOGRAM'].tolist()])
    print h
      0  1  2   3  4  5  6  7  8  9  10 11 12  13 14 15    16    17    18    19
    0  2  0  5  59  0  0  0  0  0  2  0  0  0   0  0  0     0     0     0      
    1  0  0  5   9  0  0  0  0  0  2  0  0  0  50  0     None  None  None  None
    2  0  0  5   5  0  0  0  0  0  2  0  0  0   0  4  0     0     0        None
    3  2  0  5   0  0  0  0  0  0  2  0  0  0   6  0  0     0     0  None  None
    
    #append to original, rename 0 column
    df = pd.concat([df, h], axis=1).rename(columns={0:'HISTOGRAM'})
    print df
                                     HISTOGRAM HISTOGRAM  1  2   3  4  5  ...  10  \
    0  2,0,5,59,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,         2  0  5  59  0  0  ...   0   
    1          0,0,5,9,0,0,0,0,0,2,0,0,0,50,0,         0  0  5   9  0  0  ...   0   
    2     0,0,5,5,0,0,0,0,0,2,0,0,0,0,4,0,0,0,         0  0  5   5  0  0  ...   0   
    3      2,0,5,0,0,0,0,0,0,2,0,0,0,6,0,0,0,0         2  0  5   0  0  0  ...   0   
    
      11 12  13 14 15    16    17    18    19  
    0  0  0   0  0  0     0     0     0        
    1  0  0  50  0     None  None  None  None  
    2  0  0   0  4  0     0     0        None  
    3  0  0   6  0  0     0     0  None  None  
    
    [4 rows x 24 columns]
    

提交回复
热议问题