I\'m dealing with single HTML strings like this
>> s = \'u>
\\n Some text
If I understand you right, you're looking to take this input:
u>
\n Some text
And receive this output:
\n Some text
This is done simply enough by only caring about what comes between the two inward-pointing brackets. We want:
> (so we know where to begin)\n Some text (the content) which does not contain a left-bracket< (so we know where to end)You want:
>>> s = 'u>
\n Some text
>> re.search(r'>([^<]+)<', s)
<_sre.SRE_Match object; span=(6, 55), match='>\n Some text >
(The captured group can be accessed via .group(1).)
Additionally, you may want to use re.findall if you expect there to be multiple matches per line:
>>> re.findall(r'>([^<]+)<', s)
['\n Some text ']
EDIT: To address the comment: If you have multiple matches and you want to connect them into a single string (effectively removing all HTML-like tag things), do:
>>> s = 'nbsp;
Some text.
Some \n more text.
>> ' '.join(re.findall(r'>([^<]+)<', s))
'Some text. Some \n more text.'