I\'m looking a regular expression which must extract text between HTML tag of different types.
For ex:
Span 1
- O/p:
This should suit your needs:
<([a-zA-Z]+).*?>(.*?)</\\1>
The first group contains the tag name, the second one the value inbetween.
A very specific way:
(<span>|<a href="#">|<div onclick="callMe\(\)">)(.*)(</span>|</a>|</div>)
but yeah, this will only work for those 3 examples. You'll need to use an HTML parser.
Your comment shows that you have neglected to escape the backslashes in your regex string.
And if you want to match lowercase letters add a-z
to the character classes or use Pattern.CASE_INSENSITIVE
(or add (?i)
to the beginning of the regex)
"<([A-Za-z][A-Za-z0-9]*)\\b[^>]*>(.*?)</\\1>"
If the tag contents may contain newlines, then use Pattern.DOTALL
or add (?s)
to the beginning of the regex to turn on dotall/singleline mode.