问题
I want to validate internet types input via my API.
Can you help writing a regex to match?
Example types below from http://en.wikipedia.org/wiki/Internet_media_type
application/atom+xml
application/EDI-X12
application/xml-dtd
application/zip
application/vnd.openxmlformats-officedocument.presentationml.presentation
video/quicktime
Must meet standard:
type / media type name [+suffix]
Thanks
回答1:
This is really straightforward:
\w+/[-+.\w]+
Demo: http://regex101.com/r/oH5bS7/1
And if you want to validate there's at most one +
:
\w+/[-.\w]+(?:\+[-.\w]+)?
回答2:
I recently had a need to validate media types a bit more strictly than the existing answers. Here's what I came up with, based on the intersection of the grammar from RFC 2045 Section 5.1 and RFC 7231 Section 3.1.1.1 (which disallows {}
in tokens and whitespace except between parameters). For a C-like language with (?:)
non-capturing groups:
ows = "[ \t]*";
token = "[0-9A-Za-z!#$%&'*+.^_`|~-]+";
quotedString = "\"(?:[^\"\\\\]|\\.)*\"";
type = "(application|audio|font|example|image|message|model|multipart|text|video|x-(?:" + token + "))";
parameter = ";" + ows + token + "=" + "(?:" + token + "|" + quotedString + ")";
mediaType = type + "/" + "(" + token + ")((?:" + ows + parameter + ")*)";
This ends up with a rather monstrous
"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)((?:[ \t]*;[ \t]*[0-9A-Za-z!#$%&'*+.^_`|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+|\"(?:[^\"\\\\]|\\.)*\"))*)"
which captures type, subtype, and parameters, or just
"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)"
omitting parameters. Note that these could be made more forward-compatible (and less strict) by allowing any token
for type
(as RFC 7231 does) rather than limiting to "application", "audio", etc.
In practice you may want to additionally limit inputs to IANA Registered Media Types or mailcap or specific types appropriate for your application based on intended use.
回答3:
A more general regex with support of parameter is:
(?P<main>\w+|\*)/(?P<sub>\w+|\*)(\s*;\s*(?P<param>\w+)=\s*=\s*(?P<val>\S+))?
Demo: http://regex101.com/r/lQ3rX4/2
来源:https://stackoverflow.com/questions/25201083/regex-to-match-and-validate-internet-media-type