Removing newline from a csv file

你说的曾经没有我的故事 提交于 2019-12-20 10:57:25

问题


I am trying to process a csv file in python that has ^M character in the middle of each row/line which is a newline. I cant open the file in any mode other than 'rU'.

If I do open the file in the 'rU' mode, it reads in the newline and splits the file (creating a newline) and gives me twice the number of rows.

I want to remove the newline altogether. How?


回答1:


Note that, as the docs say:

csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.

So, you can always stick a filter on the file before handing it to your reader or DictReader. Instead of this:

with open('myfile.csv', 'rU') as myfile:
    for row in csv.reader(myfile):

Do this:

with open('myfile.csv', 'rU') as myfile:
    filtered = (line.replace('\r', '') for line in myfile)
    for row in csv.reader(filtered):

That '\r' is the Python (and C) way of spelling ^M. So, this just strips all ^M characters out, no matter where they appear, by replacing each one with an empty string.


I guess I want to modify the file permanently as opposed to filtering it.

First, if you want to modify the file before running your Python script on it, why not do that from outside of Python? sed, tr, many text editors, etc. can all do this for you. Here's a GNU sed example:

gsed -i'' 's/\r//g' myfile.csv

But if you want to do it in Python, it's not that much more verbose, and you might find it more readable, so:

First, you can't really modify a file in-place if you want to insert or delete from the middle. The usual solution is to write a new file, and either move the new file over the old one (Unix only) or delete the old one (cross-platform).

The cross-platform version:

os.rename('myfile.csv', 'myfile.csv.bak')
with open('myfile.csv.bak', 'rU') as infile, open('myfile.csv', 'wU') as outfile:
    for line in infile:
        outfile.write(line.replace('\r'))
os.remove('myfile.csv.bak')

The less-clunky, but Unix-only, version:

temp = tempfile.NamedTemporaryFile(delete=False)
with open('myfile.csv', 'rU') as myfile, closing(temp):
    for line in myfile:
        temp.write(line.replace('\r'))
os.rename(tempfile.name, 'myfile.csv')


来源:https://stackoverflow.com/questions/14390123/removing-newline-from-a-csv-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!