pandas read_csv. How to ignore delimiter before line break

问题

I'm reading a file with numerical values.

data = pd.read_csv('data.dat', sep=' ', header=None)

In the text file, each row end with a space, So pandas wait for a value that is not there and add a "nan" at the end of each row. For example:

2.343 4.234

is read as: [2.343, 4.234, nan]

I can avoid it using , usecols = [0 1] but I would prefer a more general solution

回答1:

You can use regular expressions in your sep argument.

Instead of specifying the separator to be one space, you can ask it to use as a separator any number of spaces until it finds the next value. You can do this by using the regular expression \s+:

data = pd.read_csv('data.dat', sep='\s+', header=None)

回答2:

Specifying which columns to read using usecols will be a cleaner approach or you can drop the column once you have read the data but this comes with an overhead of reading data that you don't need. The generic approach will require you the create a regex parser which will be more time consuming and more messy.

回答3:

Can you change the seperator in the csv file to be something else than a space? As this might be the reason why each row ends with a nan. If you use:

    data = pd.read_csv('data.dat', sep=',', header=None)

For example, this problem might be solved without having to clean the data.

来源：https://stackoverflow.com/questions/59327525/pandas-read-csv-how-to-ignore-delimiter-before-line-break

标签

python

pandas

file

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!