问题
I am struggling to find a Regex which could match a URN as described in rfc8141. I have tried this one:
\A(?i:urn:(?!urn:)(?<nid>[a-z0-9][a-z0-9-]{1,31}):(?<nss>(?:[a-z0-9()+,-.:=@;$_!*']|%[0-9a-f]{2})+))\z
but this one only matches the first part of the URN without the components.
For example lets say we have the corresponding URN: urn:example:a123,0%7C00~&z456/789?+abc?=xyz#12/3
We should match the following groups:
- NID - example
- NSS - a123,0%7C00~&z456/789 (from the last ':' tll we match '?+' or '?=' or '#'
- r-component - abc (from '?+' till '?=' or '#'')
- f-component - 12/3 (from '#' till end)
回答1:
I haven't read all the specifications, so there may be other rules to implement, but it should put you on the way for the optional components:
\A(?i:urn:(?!urn:)(?<nid>[a-z0-9][a-z0-9-]{1,31}):(?<nss>(?:[-a-z0-9()+,.:=@;$_!*'&~\/]|%[0-9a-f]{2})+)(?:\?\+(?<rcomponent>.*?))?(?:\?=(?<qcomponent>.*?))?(?:#(?<fcomponent>.*?))?)\z
explanations:
(?<nss>(?:[-a-z0-9()+,.:=@;$_!*'&~\/]|%[0-9a-f]{2})+)
: The-
has been moved to the beginning of the list to be considered in the allowed chars, or else it means "range from,
to.
". The characters&
,~
and/
(has to be escaped with "\") have also been added to the list, or else it won't match your example.- optional components:
(?:\?\+(?<rcomponent>.*?))?
: inside an optional non-capturing group(?:)?
to prevent capturing the identifier (the?+
,?=
and#
part). The chars?
and+
have to be escaped with "\". Will capture anything (.
) but in lazy mode (*?
) or else the first component found would capture everything until the end of the string.
See working example in Regex101
Hope that helps
来源:https://stackoverflow.com/questions/59032211/regex-which-matches-urn-by-rfc8141