Pandas parsing csv error - expected 1 fields found 9

问题

I'm trying to parse from a .csv file:

planets = pd.read_csv("planets.csv", sep=',')

But I always end up with this error:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 9

This is how the first few lines of my csv file look like:

# This file was produced by the test
# Tue Apr  3 06:03:27 2018
#
# COLUMN pl_hostname:    Host Name
# COLUMN pl_discmethod:  Discovery Method
# COLUMN pl_pnum:        Number of Planets in System
# COLUMN pl_orbper:      Orbital Period [days]
# COLUMN pl_orbsmax:     Orbit Semi-Major Axis [AU])
# COLUMN st_dist:        Distance [pc]
# COLUMN st_teff:        Effective Temperature [K]
# COLUMN st_mass:        Stellar Mass [Solar mass] 
#
loc_rowid,pl_hostname,pl_discmethod,pl_pnum,pl_orbper,pl_orbsmax,st_dist,st_teff,st_mass
1,11 Com,Radial Velocity,1,326.03000000,1.290000,110.62,4742.00,2.70
2,11 UMi,Radial Velocity,1,516.22000000,1.540000,119.47,4340.00,1.80
3,14 And,Radial Velocity,1,185.84000000,0.830000,76.39,4813.00,2.20
4,14 Her,Radial Velocity,1,1773.40000000,2.770000,18.15,5311.00,0.90
5,16 Cyg B,Radial Velocity,1,798.50000000,1.681000,21.41,5674.00,0.99
6,18 Del,Radial Velocity,1,993.30000000,2.600000,73.10,4979.00,2.30
7,1RXS J160929.1-210524,Imaging,1,,330.000000,145.00,4060.00,0.85

Edit: this is line 13:

loc_rowid,pl_hostname,pl_discmethod,pl_pnum,pl_orbper,pl_orbsmax,st_dist,st_teff,st_mass

Edit: Thanks to @Rakesh, Skipping the first 12 lines solved the problem

planets = pd.read_csv("planets.csv", sep=',', skiprows=12)

回答1:

The function pandas.read_csv() gets the number of columns and their names from the first line. By default it does not consider the option of the first lines being comments.

What is happening is that pandas reads the first line, splits it and finds there is only one column, insetad of doing this split to the line 13 which is the first not commented line. To solve this, the argument comment can be used.

planets = pd.read_csv("planets.csv", comment='#')

Compared to using skiprows, this allows the same code to load the planets.csv file even if the number of comment lines vary.

回答2:

I've gotten this to work when I couldn't figure out the exact cause of the error:

planets = pd.read_csv('planets.csv', sep=',', error_bad_lines=False)

回答3:

Looks like you need skiprows. You can skip all the comments.

Ex:

planets = pd.read_csv("planets.csv", sep=',', skiprows=12)

回答4:

In addition to the above answer, if you got problem only with row 13th, you may skip it .

pd.read_csv("plants.csv", skiprows = 12, header=None)

回答5:

I just ran the following code using the csv data you provided and it ran without issues. I ran the following below

import pandas as pd planets = pd.read_csv("planets.csv", sep=',') print(planets)

With that being said, there could be a few issues.

Firstly, you could set the delimiter to sniffing sep=None to let pandas figure out what the delimiter is. You could also set headers=None So it would look like:

pd.read_csv("planets.csv", sep=None, headers=None)

There could be an encoding issue. You could try setting encoding to some of these values to see if the error exists https://docs.python.org/3/library/codecs.html#standard-encodings

来源：https://stackoverflow.com/questions/49632641/pandas-parsing-csv-error-expected-1-fields-found-9

标签

python

python-3.x

pandas

csv

data-analysis