问题
I am trying to read in a comma separated text file into Python with read_csv
. However, Python is taking the header and shifting it over to the right by one.
Data file example with less columns than I actually have: (example file with more data: https://www.dropbox.com/s/5glujwqux6d0msh/test.txt?dl=0)
DAY,TIME,GENVEG,LATI,LONGI,AREA,CHEM
226, 1200, 2, -0.5548999786D+01, 0.3167600060D+02, 0.1000000000D+07, NaN
226, 1115, 2, -0.1823500061D+02, 0.3668500137D+02, 0.1000000000D+07, NaN
If I try the following (where infile_fire
is the above txt file):
df_fires = pd.read_csv(infile_fire,sep="\,",skipinitialspace=True,engine='python')
I get this below. As you can see, DAY
is actually above what should be the TIME
column.
(Note that the value in the AREA
column comes from data I have in the larger dataset which isn't shown in the sample subset above)
I also tried df_fires = pd.read_csv(infile_fire).reset_index()
, and though it does create a new index (as I'd like it to do), it also moves the 226
column over and names it index
instead of DAY
as it should.
I've also tried the following, but still got the same result (shifted headers)
df = pd.read_csv(infile_fire)
df = pd.read_csv(infile_fire,index_col=None)
df = pd.read_csv(infile_fire,index_col=0)
How can I fix this? I just want to read in the text file and have Python set up a new index and keep the headers as is.
回答1:
Setting index to False solves this issue.
df = pd.read_csv(infile_fire,index_col=False)
回答2:
without fiddling with the options, like pandas just does the right thing, see the sep
in the doc of read_csv and csv.Sniffer.
from io import StringIO
import pandas as pd
data = """
DAY,TIME,GENVEG,LATI,LONGI,AREA
226, 1200, 2, -0.5548999786D+01, 0.3167600060D+02, 0.1000000000D+07
226, 1115, 2, -0.1823500061D+02, 0.3668500137D+02, 0.1000000000D+07
"""
df = pd.read_csv(StringIO(data))
df
回答3:
As file.txt beeing your file that you want to read.
file.txt = """
DAY,TIME,GENVEG,LATI,LONGI,AREA
226, 1200, 2, -0.5548999786D+01, 0.3167600060D+02, 0.1000000000D+07
226, 1115, 2, -0.1823500061D+02, 0.3668500137D+02, 0.1000000000D+07
"""
Using:
import pandas as pd
Read the file:
df = pd.read_csv('file.txt')
If you take a look at your df.AREA[0]
, it will be something like this:
' 0.1000000000D+07'
Use regular expressions to remove blank spaces:
df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
If you try to call your df
now, the result will be:
DAY TIME GENVEG LATI LONGI AREA
0 226 1200 2 -0.5548999786D+01 0.3167600060D+02 0.1000000000D+07
1 226 1115 2 -0.1823500061D+02 0.3668500137D+02 0.1000000000D+07
So, your df.AREA[0]
will be somthing like this:
'0.1000000000D+07'
Just like the others, for example: df.LATI[0]
'-0.5548999786D+01'
来源:https://stackoverflow.com/questions/54876912/read-csv-shifting-column-headers