Let say I have this string which contains html a tag:
Berlin-Treptow-Köpenick
<
I have made the assumption that the string to be extracted is comprised of alphanumeric characters--including accented letters--and hyphens, and that the string immediately follows the first instance of the character '>'
.
string =
'Berlin-Treptow-Köpenick'
r = /
(?<=\>) # match '>' in a positive lookbehind
[\p{Alnum}-]+ # match >= 0 alphameric character and hyphens
/x # extended or free-spacing mode
string[r] #=> "Berlin-Treptow-Köpenick"
Note that /A-Za-z0-9/
does not match accented characters such as 'ö'
.
Alternatively, one can use the POSIX syntax:
r = /(?<=\>)[[[:alnum:]]-]+/