failed to read inch symbol in pandas read_csv

て烟熏妆下的殇ゞ 提交于 2020-06-29 12:53:09

问题


I have csv with below details

Name,Desc,Year,Location

Jhon,12" Main Third ,2012,GR

Lew,"291" Line (12,596,3)",2012,GR

,All, 1992,FR

...

It is very long file. i just showed problematic lines.I am confused how can i read it in Pandas data frame, I tried

  • quotechar,

  • quoting,

  • sep

    like attribute of pandas read_csv. Still no success.

I have no control on how csv is being designed.


回答1:


You can do something like this. Try if this works for you:

import pandas as pd
import re

l1=[]
with open('/home/yusuf/Desktop/c1') as f:
    headers = f.readline().strip('\n').split(',')
    for a in f.readlines():
        if a:
            q = re.findall("^(\w*),(.*),\s?(\d+),(\w+)",a)
            if q:
                l1.append(q)

l2 = [list(b[0]) for b in l1]

df = pd.DataFrame(data=l2, columns=headers)
df

Output:

Regex Demo: https://regex101.com/r/AU2WcO/1




回答2:


You can't have the separator character inside a field. For example, in

Lew,"291" Line (12,596,3)",2012,GR

Pandas will assume you have 6 fields because you have 5 commas, even if two of them are between quotes. You would need to do some pre-processing of the text file to get rid of this issue, or ask for a different separator character (@ or | seem to work well in my experience.

Pandas has no problems reading the other lines:

import pandas as pd
print pd.read_csv('untitled.txt')

   Name             Desc  Year Location
0  Jhon  12" Main Third   2012       GR
1   NaN              All  1992       FR


来源:https://stackoverflow.com/questions/41058534/failed-to-read-inch-symbol-in-pandas-read-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!