问题
I am using pandas.read_csv to read a whitespace delimited file. The file has a variable number of whitespace characters in front of every line (the numbers are right-aligned). When I read this file, it creates a column of NaN. Why does this happen, and what is the best way to prevent it?
Example:
Text file:
9.0 3.3 4.0
32.3 44.3 5.1
7.2 1.1 0.9
Command:
import pandas as pd
pd.read_csv("test.txt",delim_whitespace=True,header=None)
Output:
0 1 2 3
0 NaN 9.0 3.3 4.0
1 NaN 32.3 44.3 5.1
2 NaN 7.2 1.1 0.9
回答1:
FWIW I tend to use \s+
instead, and it doesn't suffer the same problem:
>>> pd.read_csv("wspace.csv", header=None, delim_whitespace=True)
0 1 2 3
0 NaN 9.0 3.3 4.0
1 NaN 32.3 44.3 5.1
2 NaN 7.2 1.1 0.9
>>> pd.read_csv("wspace.csv", header=None, sep=r"\s+")
0 1 2
0 9.0 3.3 4.0
1 32.3 44.3 5.1
2 7.2 1.1 0.9
来源:https://stackoverflow.com/questions/16022094/using-pandas-to-read-text-file-with-leading-whitespace-gives-a-nan-column