RegEx to extract text between a HTML tag

前端 未结 3 1361
我寻月下人不归
我寻月下人不归 2020-12-21 01:44

I\'m looking a regular expression which must extract text between HTML tag of different types.

For ex:

Span 1 - O/p:

相关标签:
3条回答
  • 2020-12-21 02:07

    This should suit your needs:

    <([a-zA-Z]+).*?>(.*?)</\\1>
    

    The first group contains the tag name, the second one the value inbetween.

    0 讨论(0)
  • 2020-12-21 02:08

    A very specific way:

    (<span>|<a href="#">|<div onclick="callMe\(\)">)(.*)(</span>|</a>|</div>)
    

    but yeah, this will only work for those 3 examples. You'll need to use an HTML parser.

    0 讨论(0)
  • 2020-12-21 02:17

    Your comment shows that you have neglected to escape the backslashes in your regex string.

    And if you want to match lowercase letters add a-z to the character classes or use Pattern.CASE_INSENSITIVE (or add (?i) to the beginning of the regex)

    "<([A-Za-z][A-Za-z0-9]*)\\b[^>]*>(.*?)</\\1>"
    

    If the tag contents may contain newlines, then use Pattern.DOTALL or add (?s) to the beginning of the regex to turn on dotall/singleline mode.

    0 讨论(0)
提交回复
热议问题