How do I match all the beginning tags in an XML document with RegEx? I just need to collect the tag names used.
This is what I have:
(?<=<)(.*?)((?= \/>)|(?=>))
this matches all the beginning and closing tags.
Example:
<Habazutty>yaddayadda</Habazutty>
<Vogons />
<Targ>blahblah</Targ>
Above code matches:
Habazutty
/Habazutty
Vogons
Targ
/Targ
I only need
Habazutty
Vogons
Targ
I couldn't figure out a way to exclude the closing tags. Negative lookahead didn't work - found nothing. I must have messed up.
You could change (?<=<)(.*?)((?= \/>)|(?=>)) to (?<=<)([^\/]*?)((?= \/>)|(?=>)), i.e. instead of using (.*?) for the tag name, use ([^\/]*?). / is not allowed in tag names anyway.
You can achieve this simply using:
<([^\/>]+)[/]*>
The group capture will have your output
Found another solution:
((?=<)(?!<\/)<)(.*?)((?= \/>)|(?=>))
Basically this ((?=<)(?!<\/)<) looks behind everything that is "<" (?=<) and not "< /" (?!<\/).
@Redneb's answer is cleaner though, less capturing groups and shorter and fancier.
来源:https://stackoverflow.com/questions/39329607/regex-find-all-xml-tags