Regular Expression to match #hashtag but not #hashtag; (with semicolon)

主宰稳场 提交于 2020-08-04 04:41:48

问题


I have the current regular expression:

/(?<=[\s>]|^)#(\w*[A-Za-z_]+\w*)/g

Which I'm testing against the string:

Here's a #hashtag and here is #not_a_tag; which should be different. Also testing: Mid#hash. #123 #!@£ and <p>#hash</p>

For my purposes there should only be two hashtags detected in this string. I'm wondering how to alter the expression such that it doesn't match hashtags that end with a ; in my example this is #not_a_tag;

Cheers.


回答1:


How about the following:

\B(\#[a-zA-Z]+\b)(?!;)

Regex Demo

  • \B -> Not a word boundary
  • (#[a-zA-Z]+\b) -> Capturing Group beginning with # followed by any number of a-z or A-Z with a word boundary at the end
  • (?!;) -> Not followed by ;



回答2:


You can use a negative lookahead reegex:

/(?<=[\s>]|^)#(\w*[A-Za-z_]+\w*)\b(?!;)/
  • \b - word boundary ensures that we are at end of word
  • (?!;) - asserts that we don't have semi-colon at next position

RegEx Demo




回答3:


Similar to anubhava's answer but swap the 2 instances of \w* with \d* as the only difference between \w and [A-Za-z_] is the 0-9 characters

This has the effect of reducing the number of steps from 588 to 90

(?<=[\s>])#(\d*[A-Za-z_]+\d*)\b(?!;)

Regex101 demo




回答4:


/(#(?:[^\x00-\x7F]|\w)+)/g

Starts with #, then at least one (+) ANCII symbols ([^\x00-\x7F], range excluding non-ANCII symbols) or word symbol (\w).

This one should cover cases including ANCII symbols like "#їжак".




回答5:


This is the best practice.

(#+[a-zA-Z0-9(_)]{1,})


来源:https://stackoverflow.com/questions/38506598/regular-expression-to-match-hashtag-but-not-hashtag-with-semicolon

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!