How to remove unbalanced/unpartnered double quotes (in Java)

别等时光非礼了梦想. 提交于 2019-12-05 13:20:33

You could use something like (Perl notation):

s/("(?=\S)[^"]*(?<=\S)")|"/$1/g;

Which in Java would be:

str.replaceAll("(\"(?=\\S)[^\"]*(?<=\\S)\")|\"", "$1");

It could probably be done in a single regex if there is no nesting.
There is a concept of delimeters roughly defined, and it is possible to 'bias'
those rules to get a better outcome.
It all depends on what rules are set forth. This regex takes into account
three possible scenario's in order;

  1. Valid Pair
  2. Invalid Pair (with bias)
  3. Invalid Single

It also doesen't parse "" beyond end of line. But it does do multiple
lines combined as a single string. To change that, remove \n where you see it.


global context - raw find regex
shortened

(?:("[a-zA-Z0-9\p{Punct}][^"\n]*(?<=[a-zA-Z0-9\p{Punct}])")|(?<![a-zA-Z0-9\p{Punct}])"([^"\n]*)"(?![a-zA-Z0-9\p{Punct}])|")

replacement grouping

$1$2 or \1\2

Expanded raw regex:

(?:                            // Grouping
                                  // Try to line up a valid pair
   (                                 // Capt grp (1) start 
     "                               // "
      [a-zA-Z0-9\p{Punct}]              // 1 of [a-zA-Z0-9\p{Punct}]
      [^"\n]*                           // 0 or more non- [^"\n] characters
      (?<=[a-zA-Z0-9\p{Punct}])         // 1 of [a-zA-Z0-9\p{Punct}] behind us
     "                               // "
   )                                 // End capt grp (1)

  |                               // OR, try to line up an invalid pair
       (?<![a-zA-Z0-9\p{Punct}])     // Bias, not 1 of [a-zA-Z0-9\p{Punct}] behind us
     "                               // "
   (  [^"\n]*  )                        // Capt grp (2) - 0 or more non- [^"\n] characters
     "                               // "
       (?![a-zA-Z0-9\p{Punct}])      // Bias, not 1 of [a-zA-Z0-9\p{Punct}] ahead of us

  |                               // OR, this single " is considered invalid
     "                               // "
)                               // End Grouping

Perl testcase (don't have Java)

$str = '
string1=injunct! alter ego."
string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
';

print "\n'$str'\n";

$str =~ s
/
  (?:
     (
       "[a-zA-Z0-9\p{Punct}]
        [^"\n]*
        (?<=[a-zA-Z0-9\p{Punct}])
       "
     )
   |
       (?<![a-zA-Z0-9\p{Punct}])
       " 
     (  [^"\n]*  )
       " (?![a-zA-Z0-9\p{Punct}])
   |
       "
  )
/$1$2/xg;

print "\n'$str'\n";

Output

'
string1=injunct! alter ego."
string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
'

'
string1=injunct! alter ego.
string2=successor "alter ego" single employer "a" free proceeding "citation assets"
'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!