I have a string like:
\"super exemple of string key : text I want to keep - end of my string\"
I want to just keep the string which is betw
When questions are stated in terms of a single example ambiguities are inevitably be present. This question is no exception.
For the example given in the question the desired string is clear:
super example of string key : text I want to keep - end of my string
^^^^^^^^^^^^^^^^^^^
However, this string is but an example of strings and boundary strings for which certain substrings are to be identified. I will consider a generic string with generic boundary strings, represented as follows.
abc FF def PP ghi,PP jkl,FF mno PP pqr FF,stu FF vwx,PP yza
^^^^^^^^^^^^ ^^^^^
PP is the preceding string, FF is the following string and the party hats indicate which substrings are to be matched. (In the example given in the question key : is the preceding string and - is the following string.) I have assumed that PP and FF are preceded and followed by word boundaries (so that PPA and FF8 are not matched).
My assumptions, as reflected by the party hats, are as follows:
PP may be preceded by one (or more) FF substrings, which, if present, are disregarded;PP is followed by one or more PPs before FF is encountered, the following PPs are part of the substring between the preceding and following strings;PP is followed by one or more FFs before a PP is encounter, the first FF following PP is considered to be the following string.Note that many of the answers here deal with only strings of the form
abc PP def FF ghi
^^^^^
or
abc PP def FF ghi PP jkl FF mno
^^^^^ ^^^^^
One may use a regular expression, code constructs, or a combination of the two to identify the substrings of interest. I make no judgement as to which approach is best. I will only present the following regular expression that will match the substrings of interest.
(?<=\bPP\b)(?:(?!\bFF\b).)*(?=\bFF\b)
Start your engine!1
I tested this with the PCRE (PHP) regex engine, but as the regex is not at all exotic, I am sure it will work with the .NET regex engine (which is very robust).
The regex engine performs the following operations:
(?<= : begin a positive lookbehind
\bPP\b : match 'PP'
) : end positive lookbehind
(?: : begin a non-capture group
(?! : begin a negative lookahead
\bFF\b : match 'FF'
) : end negative lookahead
. : match any character
) : end non-capture group
* : execute non-capture group 0+ times
(?= : begin positive lookahead
\bFF\b : match 'FF'
) : end positive lookahead
This technique, of matching one character at a time, following the preceding string, until the character is F and is followed by F (or more generally, the character beings the string that constitutes the following string), is called Tempered Greedy Token Solution.
Naturally, the regex would have to be modified (if possible) if the assumptions I set out above are changed.
1. Move the cursor around for detailed explanations.