regex to remove comma between double quotes notepad++

前端 未结 4 2044
刺人心
刺人心 2020-12-05 05:50

I am trying to remove commas inside double quotes from a csv file in notepad++, this is what I have:

1070,17,2,GN3-670,\"COLLAR B, M STAY\",\"2,606.45\"


        
相关标签:
4条回答
  • 2020-12-05 06:19

    For a line with multiple instances of "comma within double quotes", I can think of the following perl script - you need to have a header line without this kind of instance so that you know how many comma-separated fields there should be.

    #! /usr/bin/perl -w
    
    use strict;
    
    my $n_fields = "";
    while (<>) {
        s/\s+$//;
        if (/^\#/) { # header line
            my @t = split(/,/);
            $n_fields = scalar(@t); # total number of fields
        } else { # actual data
            my $n_commas = $_ =~s/,/,/g; # total number of commas
            foreach my $i (0 .. $n_commas - $n_fields) { # iterate ($n_commas - $n_fields + 1) times
                s/(\"[^",]+),([^"]+\")/$1\\x2c$2/g; # single replacement per previous answers
            }
            s/\"//g; # removal of double quotes (if you want)
        }
        print "$_\n";
    }
    
    0 讨论(0)
  • 2020-12-05 06:20

    Try the following

    import re
    
    print re.sub(',(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)',"",string)
    

    This will remove comma between quotes

    0 讨论(0)
  • 2020-12-05 06:31

    Just an update to @zx81's brilliant solution. Lets say you have 2commas between quotes

    Then the regex search has to be modified as follows:

    ("[^",]+),([^",]+),([^"]+")
    

    Replace needs to be modified as

    $1$2$3
    

    So on modify it depending on the # of commas.

    I tried exploring to see if recursive regex was possible but the does not seem to be possible as of now

    0 讨论(0)
  • 2020-12-05 06:32

    mrki, this will do what you want (tested in N++):

    Search: ("[^",]+),([^"]+")

    Replace: $1$2 or \1\2

    How does this work? The first parentheses capture the beginning of the string up to (but not including) the comma into Group 1. The second parentheses capture the end of the string after the comma into Group 2. The replacement substitutes the string with a concatenation of Group 1 and Group 2.

    In more detail: in the first parentheses, we match the opening double quotes then any number of characters that are not a comma. That is the meaning of [^,]+. In the second parentheses, we match any number of characters that are not a double quote with [^"]+, then the closing double quotes .

    0 讨论(0)
提交回复
热议问题