Reading csv containing a list in Pandas

前端 未结 4 1394
星月不相逢
星月不相逢 2020-12-14 11:09

I\'m trying to read this csv into pandas

HK,\"[u\'5328.1\', u\'5329.3\', \'2013-12-27 13:58:57.973614\']\"
HK,\"[u\'5328.1\', u\'5329.3\', \'2013-12-27 13:5         


        
相关标签:
4条回答
  • 2020-12-14 11:42

    use .strip() in python.

    with open(csvfile, 'r')as infile:
        reader = csv.reader(infile)
        for row in reader:
            col1 = row[0]
            col2 = row[1:].strip("[]")
    
    0 讨论(0)
  • Based alko's answer, you can use the df.apply() function for the first part to read the actual data in the list string:

     >>> df = pd.read_clipboard(header=None,sep=',')
     >>> df
         0                                                  1
      0  HK  [u'5328.1', u'5329.3', '2013-12-27 13:58:57.97...
      1  HK  [u'5328.1', u'5329.3', '2013-12-27 13:58:59.23...
      2  HK  [u'5328.1', u'5329.3', '2013-12-27 13:59:00.34...
     >>> df[1] = df[1].apply(eval)
     >>> df
         0                                             1
      0  HK  [5328.1, 5329.3, 2013-12-27 13:58:57.973614]
      1  HK  [5328.1, 5329.3, 2013-12-27 13:58:59.237387]
      2  HK  [5328.1, 5329.3, 2013-12-27 13:59:00.346325]
    
    0 讨论(0)
  • 2020-12-14 11:47
    df['new_column'] = df['column'].apply(lambda x: ast.literal_eval(x))
    

    Just run the above code on the column containing list as string.

    0 讨论(0)
  • 2020-12-14 11:55

    One option is to use ast.literal_eval as converter:

    >>> import ast
    >>> df = pd.read_clipboard(header=None, quotechar='"', sep=',', 
    ...                   converters={1:ast.literal_eval})
    >>> df
        0                                             1
    0  HK  [5328.1, 5329.3, 2013-12-27 13:58:57.973614]
    1  HK  [5328.1, 5329.3, 2013-12-27 13:58:59.237387]
    2  HK  [5328.1, 5329.3, 2013-12-27 13:59:00.346325]
    

    And convert those lists to a DataFrame if needed, for example with:

    >>> df = pd.DataFrame.from_records(df[1].tolist(), index=df[0],
    ...                           columns=list('ABC')).reset_index()
    >>> df['C'] = pd.to_datetime(df['C'])
    >>> df
        0       A       B                          C
    0  HK  5328.1  5329.3 2013-12-27 13:58:57.973614
    1  HK  5328.1  5329.3 2013-12-27 13:58:59.237387
    2  HK  5328.1  5329.3 2013-12-27 13:59:00.346325
    
    0 讨论(0)
提交回复
热议问题