Regex to match and validate internet media type?

六眼飞鱼酱① 提交于 2019-12-10 20:42:10

问题


I want to validate internet types input via my API.

Can you help writing a regex to match?

Example types below from http://en.wikipedia.org/wiki/Internet_media_type

application/atom+xml
application/EDI-X12
application/xml-dtd
application/zip
application/vnd.openxmlformats-officedocument.presentationml.presentation
video/quicktime

Must meet standard:

type / media type name [+suffix]

Thanks


回答1:


This is really straightforward:

\w+/[-+.\w]+

Demo: http://regex101.com/r/oH5bS7/1

And if you want to validate there's at most one +:

\w+/[-.\w]+(?:\+[-.\w]+)?




回答2:


I recently had a need to validate media types a bit more strictly than the existing answers. Here's what I came up with, based on the intersection of the grammar from RFC 2045 Section 5.1 and RFC 7231 Section 3.1.1.1 (which disallows {} in tokens and whitespace except between parameters). For a C-like language with (?:) non-capturing groups:

ows = "[ \t]*";
token = "[0-9A-Za-z!#$%&'*+.^_`|~-]+";
quotedString = "\"(?:[^\"\\\\]|\\.)*\"";
type = "(application|audio|font|example|image|message|model|multipart|text|video|x-(?:" + token + "))";
parameter = ";" + ows + token + "=" + "(?:" + token + "|" + quotedString + ")";
mediaType = type + "/" + "(" + token + ")((?:" + ows + parameter + ")*)";

This ends up with a rather monstrous

"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)((?:[ \t]*;[ \t]*[0-9A-Za-z!#$%&'*+.^_`|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+|\"(?:[^\"\\\\]|\\.)*\"))*)"

which captures type, subtype, and parameters, or just

"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)"

omitting parameters. Note that these could be made more forward-compatible (and less strict) by allowing any token for type (as RFC 7231 does) rather than limiting to "application", "audio", etc.

In practice you may want to additionally limit inputs to IANA Registered Media Types or mailcap or specific types appropriate for your application based on intended use.




回答3:


A more general regex with support of parameter is:

(?P<main>\w+|\*)/(?P<sub>\w+|\*)(\s*;\s*(?P<param>\w+)=\s*=\s*(?P<val>\S+))?

Demo: http://regex101.com/r/lQ3rX4/2



来源:https://stackoverflow.com/questions/25201083/regex-to-match-and-validate-internet-media-type

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!