Regular Expressions - how to replace a character within quotes

前端 未结 3 1685
执念已碎
执念已碎 2020-12-16 03:16

Hello regular expression experts,

There has never been a string manipulation problem I couldn\'t resolve with regular expressions until now, at least in an elegant m

相关标签:
3条回答
  • 2020-12-16 03:30

    I'll help you, but you have to promise to stop using the word "elegant". It's been working too hard lately, and deserves a rest. :P

    (?m),(?=[^"]*"(?:[^"\r\n]*"[^"]*")*[^"\r\n]*$)
    

    This matches a comma if, between the comma and the end of the record, there's an odd number of quotation marks. I'm assuming a standard CSV format, in which a record ends at the next line separator that isn't enclosed in quotes. Line separators are legal inside quoted fields, as are quotes if they're escaped with another quote.

    Depending on which regex flavor you're using, you may have to use \r?$ instead of just $. In .NET, for example, only the linefeed (\n) is considered a line separator. But in Java, $ matches before the \r in \r\n, but not between the \r and the \n (unless you set UNIX_LINES mode).

    0 讨论(0)
  • 2020-12-16 03:32

    Regular expressions are not particularly good at matching balanced text (i.e. starting and ending quotes).

    A naïve approach would be to repeatedly apply something like this (until it no longer matched):

    s/(^[^"]*(?:"[^"]*"[^"]*)*?)"([^",]*),([^"]*)"/$1"$2_$3"/
    

    But that wouldn't work with escaped quotes. The best (i.e. simplest, most readable, and most maintanable) solution is to use a CSV file parser, go through all the field values one by one (replacing commas with underscores as you go), then write it back out to the file.

    0 讨论(0)
  • 2020-12-16 03:39

    Excuse me if you're not using Python, in which is the following code. I didn't see any indication of which language you use. Anyway, I think the code is perfectly understandable.

    import re
    
    ch = '''0,"section1","(7) Delivery of 'certificate' outside the United States prohibited.
    Since both section 339 of the 1940 statute, 68/ and section 341 of the present law are explicit
    in their statement that the certificate shall be furnished the citizen, only if such individual
    is at the time within the United States, it is clear that the document could not and cannot be
    delivered outside the United States.",http://www.google.com/
    
    1,"section2",,http://www.google.com/
    
    2,"section3",",,",http://www.google.com/
    '''
    
    poto = re.compile('("[^"]+")')
    
    def comma_replacement(match):
        return match.group().replace(',','_')
    
    print poto.sub(comma_replacement , ch)
    

    This method keeps the 2 adjacent commas in the line

    1,"section2",,http://www.google.com/

    unchanged. Is it the right thing you want ?

    0 讨论(0)
提交回复
热议问题