Regular expression replace a word by a link

后端 未结 7 2427
走了就别回头了
走了就别回头了 2020-12-18 11:51

I want to write a regular expression that will replace the word Paris by a link, for only the word is not ready a part of a link.

Example:

    i\'m l         


        
7条回答
  •  不知归路
    2020-12-18 12:15

    This is hard to do in one step. Writing a single regex that does that is virtually impossible.

    Try a two-step approach.

    1. Put a link around every "Paris" there is, regardless if there already is another link present.
    2. Find all incorrectly nested links (Paris), and eliminate the inner link.

    Regex for step one is dead-simple:

    \bParis\b
    

    Regex for step two is slightly more complex:

    (]+>.*?(?!:))]+>(Paris)
    

    Use that one on the whole string and replace it with the content of match groups 1 and 2, effectively removing the surplus inner link.

    Explanation of regex #2 in plain words:

    • Find every link (]+>), optionally followed by anything that is not itself followed by a closing link (.*?(?!:)). Save it into match group 1.
    • Now look for the next link (]+>). Make sure it is there, but do not save it.
    • Now look for the word Paris. Save it into match group 2.
    • Look for a closing link (). Make sure it is there, but don't save it.
    • Replace everything with the content of groups 1 and 2, thereby losing everything you did not save.

    The approach assumes these side conditions:

    • Your input HTML is not horribly broken.
    • Your regex flavor supports non-greedy quantifiers (.*?) and zero-width negative look-ahead assertions ((?!:...)).
    • You wrap the word "Paris" only in a link in step 1, no additional characters. Every "Paris" becomes "Paris", or step two will fail (until you change the second regex).
    • BTW: regex #2 explicitly allows for constructs like this:

      in the capital of France, Paris

      The surplus link comes from step one, replacement result of step 2 will be:

      in the capital of France, Paris

提交回复
热议问题