Regex for matching literal strings

安稳与你 提交于 2019-12-24 09:58:50

问题


I'm trying to write a regular expression which will match a string. For simplicity, I'm only concerned with double quote (") strings for the moment.

So far I have this: "\"[^\"]*\""

This works for most strings but fails when there is an escaped double quote such as this:

"a string \" with an escaped quote"

In this case, it only matches up to the escaped quote.

I've tried several things to allow an escaped quote but so far I've been unsuccessful, can anyone give me a hand?


回答1:


I've managed to solve it myself:

"\"(\\.|[^\"\\])*\""



回答2:


Try this:

"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)*"

If you want a multi-line escaped string you can use:

"[^"\\]*(?:\\.[^"\\]*)*"



回答3:


You need a negative lookbehind. Check if this works?

"\"[^\"]*(?<!\\)"

(?<!\\)" is supposed to match " that's not followed by \.




回答4:


Try:

"((\\")|[^"(\\")])+"

From Regular Expression Library.




回答5:


Usually you want to accept escaped anything.

" [^"\\]* (?: \\. [^"\\]* )* " would be the fastest.

"[^"\\]*(?:\\.[^"\\]*)*" compressed.




回答6:


POSIX does not, AFAIK, support lookaround - without it, there is really no way to do this with just regular expressions. However, according to a POSIX emulator I have (no access to a native environment or library), This might get you close, in certain cases:

"[^\"]*"|"[^\]*\\|\\[^\"]*[\"]

it will capture the part before and the part after the escaped quote... with this source string (ignore the line breaks, an imagine it's all in one string):

I want to match "this text" and "This text, where there is an escaped 
slash (\\), and an \"escaped quote\" (\")", but I also want to handle\\ escaped
back-slashes, as in "this text, with a \\ backslash: \\" -- with a little
text behind it!

it will capture these groups:

"this text"                                          -- simple, quoted string
"This text, where there is an escaped slash (\       -- part 1 of quoted string
\), and an \                                         -- part 2
"escaped quote\                                      -- part 3
" (\                                                 -- part 4
")"                                                  -- part 5, and ends with a quote
\\                                                   -- not part of a quoted string
"this text, with a \                                 -- part 1 of quoted string
\ backslash: \                                       -- part 2
\"                                                   -- part 3, and ends with a quote

With further analysis you can combine them, as appropriate:

  • If the group starts and ends with a ", then it is fine on its own
  • If the group starts with a ", and ends with a \, then it needs to be IMMEDIATELY followed by another match group that either ends with a quote character itself, or recursively continues to be IMMEDIATELY followed by another match group
  • If the group does not immediately follow another match, it is not part of a quoted string

I think that's all the analysis that you need - but make sure to test it!!!

Let me know if this idea helps!

EDIT: Additional note: just to be clear, for this to work all quotes in the entire source string must be escaped if they are not to be used as delimiters, and backslashes must be escaped everywhere as well



来源:https://stackoverflow.com/questions/7796904/regex-for-matching-literal-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!