read_csv shifting column headers

我的梦境 提交于 2021-01-29 05:35:14

问题


I am trying to read in a comma separated text file into Python with read_csv. However, Python is taking the header and shifting it over to the right by one.

Data file example with less columns than I actually have: (example file with more data: https://www.dropbox.com/s/5glujwqux6d0msh/test.txt?dl=0)

DAY,TIME,GENVEG,LATI,LONGI,AREA,CHEM
 226,  1200,     2,   -0.5548999786D+01,    0.3167600060D+02,    0.1000000000D+07, NaN
 226,  1115,     2,   -0.1823500061D+02,    0.3668500137D+02,    0.1000000000D+07, NaN

If I try the following (where infile_fire is the above txt file):

df_fires = pd.read_csv(infile_fire,sep="\,",skipinitialspace=True,engine='python')

I get this below. As you can see, DAY is actually above what should be the TIMEcolumn. (Note that the value in the AREA column comes from data I have in the larger dataset which isn't shown in the sample subset above)

I also tried df_fires = pd.read_csv(infile_fire).reset_index(), and though it does create a new index (as I'd like it to do), it also moves the 226 column over and names it index instead of DAY as it should.

I've also tried the following, but still got the same result (shifted headers)

df = pd.read_csv(infile_fire)

df = pd.read_csv(infile_fire,index_col=None)

df = pd.read_csv(infile_fire,index_col=0)

How can I fix this? I just want to read in the text file and have Python set up a new index and keep the headers as is.


回答1:


Setting index to False solves this issue.

df = pd.read_csv(infile_fire,index_col=False)




回答2:


without fiddling with the options, like pandas just does the right thing, see the sep in the doc of read_csv and csv.Sniffer.

from io import StringIO

import pandas as pd

data = """
DAY,TIME,GENVEG,LATI,LONGI,AREA
 226,  1200,     2,   -0.5548999786D+01,    0.3167600060D+02,    0.1000000000D+07
 226,  1115,     2,   -0.1823500061D+02,    0.3668500137D+02,    0.1000000000D+07
"""

df = pd.read_csv(StringIO(data))
df




回答3:


As file.txt beeing your file that you want to read.

file.txt = """
    DAY,TIME,GENVEG,LATI,LONGI,AREA
     226,  1200,     2,   -0.5548999786D+01,    0.3167600060D+02,    0.1000000000D+07
     226,  1115,     2,   -0.1823500061D+02,    0.3668500137D+02,    0.1000000000D+07
    """

Using:

import pandas as pd

Read the file:

df = pd.read_csv('file.txt')

If you take a look at your df.AREA[0], it will be something like this:

'    0.1000000000D+07'

Use regular expressions to remove blank spaces:

df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)

If you try to call your df now, the result will be:

   DAY  TIME  GENVEG               LATI             LONGI              AREA
0  226  1200       2  -0.5548999786D+01  0.3167600060D+02  0.1000000000D+07
1  226  1115       2  -0.1823500061D+02  0.3668500137D+02  0.1000000000D+07

So, your df.AREA[0] will be somthing like this:

'0.1000000000D+07'

Just like the others, for example: df.LATI[0]

'-0.5548999786D+01'


来源:https://stackoverflow.com/questions/54876912/read-csv-shifting-column-headers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!