Query string degenerate cases

折月煮酒 提交于 2019-12-02 13:13:40

I am looking [...] for a correct regular expression for [valid] URI query strings.

Sure thing, no prob. As per RFC 3986, appendix B, here it is:

^([^#]*)$

If you want something more elaborate, you can check section 3.4 for the allowed characters in addition to percent-encoded entities. The regex would look something like this:

^(%[[:xdigit:]]{2}|[[:print:]])*$

As far as RFC 3986 is concerned, all your examples are valid so far. The RFC is telling us how the query string has to be encoded while saying little about how the query string has to be structured. Older RFCs are continuously shifting authority over the structure of query strings between CGI and HTTP without ever formally specifying a grammar (see e.g. RFC 3875, sec. 4.1.7, RFC 2396, sec. 3.4, RFC 1808, sec. 2.1, …).

An interesting note can be found in RFC 7230, section 2.4:

Applications MUST NOT directly specify the syntax of queries, as this can cause operational difficulties for deployments that do not support a particular form of a query. […] HTML constrains the syntax of query strings used in form submission. New form languages SHOULD NOT emulate it, but instead allow creation of a broader variety of URIs

For a full validity check on such query strings, you would have to implement the algorithm for decoding formdata recommended by the W3C. Could be done in regex, but I would advise against it for reasons of sanity.

With regard to your examples: I believe they are all valid. How they are interpreted should be left to the receiving application. Some are not as much of a fringe case as you may think, though: ?&& is simply an empty dictionary while ?=a could map to { "": "a" }.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!