Can python's csv reader leave the quotes in?

孤人 提交于 2020-11-29 04:39:53

问题


I want to use the python CSV reader but I want to leave the quotes in. That is I want:

>>> s = '"simple|split"|test'
>>> reader = csv.reader([s], delimiter='|', skipinitialspace=True)
>>> reader.next()
['"simple|split"', 'test']

But I actually get:

['simple|split', 'test']

In my case I want the quoted string to be passed on still quoted.

I know the CSV reader is working as intended and my use case is an abuse of it, but is there some way to bend it to my will? Or do I have to write my own string parser?


回答1:


You're going to have to write your own parser, as the part of the module that backs parsing and quotes is in the C side of things, in particular parse_process_char located in Modules/_csv.c:

    else if (c == dialect->quotechar &&
             dialect->quoting != QUOTE_NONE) {
        if (dialect->doublequote) {
            /* doublequote; " represented by "" */
            self->state = QUOTE_IN_QUOTED_FIELD;
        }
        else {
            /* end of quote part of field */
            self->state = IN_FIELD;
        }
    }
    else {
        /* normal character - save in field */
        if (parse_add_char(self, c) < 0)
            return -1;
    }

That "end of quote part of field" section is what's chomping your double quote. On the other hand, you might be able to kill that else conditional and rebuild the python source code. However that's not all that maintainable to be honest.

Edit: Sorry I meant add the bit from the last else before self->state = IN_FIELD so it adds the quote in.




回答2:


I don't understand if you have a clear view of what you are trying to obtain.
You say "I know (...) my use case is an abuse" .
But abuse implies that exists the possibility of use.
However, in you case, there is no possible use, what you "described" is impossible because what is passed to a CSV parser must be of a valid CSV format and yours isn't.

In a CSV valid string, most of the characters are information and some characters are meta-information necessary to interpret the string to extract the information.
What you describe is that you want that characters " should be in the information category and meta-information category altogether. It's like someone wanting to catch his/her left hand with one's left hand.....

This problem is occurring with your string because it isn't a string coming from the reading of a CSV file. It's a string written as is.
It's impossible to obtain a string like this from the reading of a CSV file, because it couldn't have been written like that in the CSV file.
If written to a CSV file, '"simple|split"|test' could be written

  • """simple|split"""|test
    with doublequote set to True, the default

  • or #"simple#|split#"|test
    with doublequote = False, escapechar = '#'

.

If you want to extract the information like you described, you have not to create a parser, you have just to use an already existing tool:

import re

reg = re.compile('".*?"|[^|]+')

print reg.findall('yoo|"simple|split"|test|end"pos|hu')

result

['yoo', '"simple|split"', 'test', 'end"pos', 'hu']


来源:https://stackoverflow.com/questions/15294863/can-pythons-csv-reader-leave-the-quotes-in

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!