XML schema; multiple from a list of valid attribute values

落花浮王杯 提交于 2019-12-04 10:34:22

The basic problem can be addressed with enumerations as well:

<xs:attribute name="methods" use="required">
    <xs:simpleType>
        <xs:restriction>
            <xs:simpleType>
                <xs:list>
                    <xs:simpleType>
                        <xs:restriction base="xs:token">
                            <xs:enumeration value="get"/>
                            <xs:enumeration value="post"/>
                            <xs:enumeration value="put"/>
                            <xs:enumeration value="delete"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:list>
            </xs:simpleType>
            <xs:minLength value="1"/>
        </xs:restriction>
    </xs:simpleType>
</xs:attribute>

This unfortunately has the same limitation as the <xs:pattern> solution and cannot validate that each token in the list is unique. It does however address the whitespace issue (getpost would be rejected).

You can use regular expressions as a restriction on a simpleType: http://www.w3.org/TR/xmlschema-2/#dt-pattern

I'm not a regex expert but it would be something like this:

<xs:attribute name="methods" use="required">
   <xs:simpleType>
      <xs:restriction base="xs:string">
         <xs:pattern value='((get|post|put|delete)[/s]*){4}'/>
      </xs:restriction>
   </xs:simpleType>
</xs:attribute>

After periodically screwing around with this, I came up with this hulk of a pattern; first in PCRE pretty-print:

^
(
  (get     (\s post)?    (\s put)?     (\s delete)?  (\s head)?    (\s options)?)
| (post    (\s put)?     (\s delete)?  (\s head)?    (\s options)?)
| (put     (\s delete)?  (\s head)?    (\s options)?)
| (delete  (\s head)?    (\s options)?)
| (head    (\s options)?)
| (options)
)
$

And XML compatible:

((get(\spost)?(\sput)?(\sdelete)?(\shead)?(\soptions)?)|(post(\sput)?(\sdelete)?(\shead)?(\soptions)?)|(put(\sdelete)?(\shead)?(\soptions)?)|(delete(\shead)?(\soptions)?)|(head(\soptions)?)|(options))

This will successfully match any permutation of get post put delete head and options, further requiring that they be correctly ordered (which is kinda nice too)

Anyways, in summary:

"get post put delete head options" // match

"get put delete options"           // match

"get get post put"                 // fail; double get

"get foo post put"                 // fail; invalid token, foo

"post delete"                      // match

"options get"                      // fail; ordering

This pattern doesn't scale the greatest, as each new "token" needs to be included in every group, but given the problem domain is HTTP methods, change is unforeseeable and I figure it should work just fine.


Also, here's a quick script (PHP) to generate the pattern:

$tokens = ['get', 'post', 'put', 'delete', 'head', 'options'];

echo implode('|', array_map(function ($token) use (&$tokens) {
    return sprintf('(%s%s)', array_shift($tokens),
        implode(null, array_map(function ($token) {
            return sprintf('(\s%s)?', $token);
        }, $tokens)));
}, $tokens));

It omits the outermost () because I don't think it's necessary.

You could deal with whitespaces like this:

(get|post|put|delete)(\sget|\spost|\sput|\sdelete){0,3}

It will not match getpost.

I needed something similar to what you wanted, but I didn't want any order to be enforced, and I didn't want the pattern to grow exponentially as more possible values were added.

Using your enumeration as the example, the pattern I've come up with goes like this:

(?:get|post|put|delete|head|options)(?:\s(?:(?<!.*\bget\b.*)get|
(?<!.*\bpost\b.*)post|(?<!.*\bput\b.*)put|(?<!.*\bdelete\b.*)delete|
(?<!.*\bhead\b.*)head|(?<!.*\boptions\b.*)options))*

This part

(?:[values])

simply requires that at least one of the options are chosen. If no value is also allowed, surround the entire expression with this: (?:[...])?

The remainder

(?:\s(?:[values-with-restraints]))*

allows for zero-or-more whitespace-plus-value combinations. The values are given in this format

(?<!.*\b[value]\b.*)[value]

which uses negative look-behind (?<![...]) to make sure that it doesn't already exist previously in the text. I'm using word boundary markers \b to make sure that options that are part in others don't cause problems. An example is if you have options foo, bar and foobar, you don't want the option foobar to prevent the foo and bar options from being legal.

Just keep in mind that since this is going into XML, you'll have to replace the < character with &lt; when you put it into your schema.

Also, final warning, not all regular expression processors support the lookbehind feature.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!