Regex to accept 3 out of 4 rules

狂风中的少年 提交于 2020-01-24 09:54:12

问题


I can't seem to get the regex correct for the following requirement: a string between 8 and 20 length that must contain at least 1 uppercase alphabet character, at least 1 lowercase alphabet character, and either at least 1 digit or at least 1 special character (or both). Let's say special characters are restricted to include just @,#,&,~.

I wrote this initially:

^(?=.*?[A-Z])(?=.*?[a-z])(?=(.*?[0-9])|(.*?[@#&~])).{8,20}$

So as expected it successfully matches strings like 5abcdefG, Abc@defghi, 5abcdefG~, etc.

The problem is it allows characters OTHER than the 4 special ones I mentioned. So strings like 1€abcdefG and Abc!defghi also match, but they shouldn't. What am I missing?


回答1:


The point is that your . matches any char but a newline, so it can match a lot of characters other than your 4 special chars, letters or digits.

Also, it makes no sense to split OR condition into 2 alternative branches with lookaheads ((?=(.*?[0-9])|(.*?[@#&~]))). You can merge that condition into a single (?=.*?[0-9@#&~]). The point is that the ranges/chars inside the positive character class are "OR'ed", [0-9@#&~] matches either a digit, or @, or #, or &, or ~.

I suggest

^(?=[^A-Z]*[A-Z])(?=[^a-z]*[a-z])(?=[^0-9@#&~]*[0-9@#&~])[A-Za-z0-9@#&~]{8,20}$

See this regex demo

You may also use comment mode or blocks to build a dynamic pattern to make the pattern readable and maintainable:

^                           # start of string
  (?=[^A-Z]*[A-Z])          # string must have an uppercase letter
  (?=[^a-z]*[a-z])          # string must have a lowercase letter
  (?=[^0-9@#&~]*[0-9@#&~])  # string must have a digit or defined special char
  [A-Za-z0-9@#&~]{8,20}     # The string should have 8 to 20 symbols from the defined set
$                           # end of string

The [A-Za-z0-9@#&~] will only allow letters, digits, and special chars you specify in this character class.

This regex also conforms to the principle of contrast (lookaheads fail or match quicker with negated character classes).




回答2:


The simple answer here is don't use a single regex. This will simplify everything:

  • 8 to 20 characters: Every language provides a standard way fetching the string length. Use it and just check the number.
  • Contains an uppercase letter: Check that it matches [A-Z]. You may need to modify this for internationalization.
  • Contains a lowercase letter: Check that it matches [a-z]. You may need to modify this for internationalization.
  • Contains a digit: Check that it matches [0-9].
  • Contains a special character: Check that it matches [@#&~].
  • Consists of only allowed characters: Make it match ^[A-Za-z0-9@#&~]+$. (This seems like a dubious requirement, especially if this is for passwords.)

You'll have some extra conditionals around the last two checks to require only one, but that isn't a big deal.

The bottom line is that no one will be able to read a single regex for this. You'll have to document everything that it does, and every developer who touches that regex will either hate you or re-implement it as multiple checks like I've described here. Stop. Seriously. This is "parse HTML with regex" level of bad design. Just use multiple checks. It is the most sane approach.

Most importantly, it will be much easier to add new requirements later on, and you'll have to do it anyway if you run into something that isn't possible to check via regex.



来源:https://stackoverflow.com/questions/37954914/regex-to-accept-3-out-of-4-rules

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!