问题
I'm reading a file with numerical values.
data = pd.read_csv('data.dat', sep=' ', header=None)
In the text file, each row end with a space, So pandas wait for a value that is not there and add a "nan" at the end of each row. For example:
2.343 4.234
is read as: [2.343, 4.234, nan]
I can avoid it using , usecols = [0 1]
but I would prefer a more general solution
回答1:
You can use regular expressions in your sep
argument.
Instead of specifying the separator to be one space, you can ask it to use as a separator any number of spaces until it finds the next value. You can do this by using the regular expression \s+
:
data = pd.read_csv('data.dat', sep='\s+', header=None)
回答2:
Specifying which columns to read using usecols
will be a cleaner approach or you can drop the column once you have read the data but this comes with an overhead of reading data that you don't need. The generic approach will require you the create a regex parser which will be more time consuming and more messy.
回答3:
Can you change the seperator in the csv file to be something else than a space? As this might be the reason why each row ends with a nan. If you use:
data = pd.read_csv('data.dat', sep=',', header=None)
For example, this problem might be solved without having to clean the data.
来源:https://stackoverflow.com/questions/59327525/pandas-read-csv-how-to-ignore-delimiter-before-line-break