Read csv with pandas with commented header

可紊 提交于 2021-02-07 13:33:23

问题


I have CSV files that have # in the header line:

s = '#one two three\n1 2 3'

If I use pd.read_csv the # sign gets into the first header:

import pandas as pd
from io import StringIO
pd.read_csv(StringIO(s), delim_whitespace=True)
     #one  two  three
0     1    2      3

If I set the argument comment='#', then pandas ignores the line completely.

Is there an easy way to handle this case?

Second issue, related, is how can I handle quoting in this case, it works with no #:

s = '"one one" two three\n1 2 3'
print(pd.read_csv(StringIO(s), delim_whitespace=True))
   one one  two  three
0        1    2      3

it doesn't with #:

s = '#"one one" two three\n1 2 3'
print(pd.read_csv(StringIO(s), delim_whitespace=True))
   #"one  one"  two  three
0      1     2    3    NaN

Thanks!

++++++++++ Update

here is a test for the second example.

s = '#"one one" two three\n1 2 3'
# here I am cheating slicing the string
wanted_result = pd.read_csv(StringIO(s[1:]), delim_whitespace=True)
# is there a way to achieve the same result configuring somehow read_csv?
assert wanted_result.equals(pd.read_csv(StringIO(s), delim_whitespace=True))

回答1:


You can rename the first header of the read_csv() output this way:

import pandas as pd

from io import StringIO
df = pd.read_csv(StringIO(s), delim_whitespace=True)
new_name =  df.columns[0].split("#")[0]
df.rename(columns={df.columns[0]:new_name})



回答2:


You can remove the first # of your file this way :

s = u'#"one one" two three\n1 2 3'

import pandas as pd
from io import StringIO

wholefile=StringIO(s).read().split("#")[1]

pd.read_csv(StringIO(wholefile), delim_whitespace=True)

   one one  two  three
0        1    2      3

The inconvenient is that you need to load the whole file in memory, but it works.



来源:https://stackoverflow.com/questions/30311776/read-csv-with-pandas-with-commented-header

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!