Regex to accept 3 out of 4 rules

问题

I can't seem to get the regex correct for the following requirement: a string between 8 and 20 length that must contain at least 1 uppercase alphabet character, at least 1 lowercase alphabet character, and either at least 1 digit or at least 1 special character (or both). Let's say special characters are restricted to include just @,#,&,~.

I wrote this initially:

^(?=.*?[A-Z])(?=.*?[a-z])(?=(.*?[0-9])|(.*?[@#&~])).{8,20}$

So as expected it successfully matches strings like 5abcdefG, Abc@defghi, 5abcdefG~, etc.

The problem is it allows characters OTHER than the 4 special ones I mentioned. So strings like 1€abcdefG and Abc!defghi also match, but they shouldn't. What am I missing?

回答1:

The point is that your . matches any char but a newline, so it can match a lot of characters other than your 4 special chars, letters or digits.

Also, it makes no sense to split OR condition into 2 alternative branches with lookaheads ((?=(.*?[0-9])|(.*?[@#&~]))). You can merge that condition into a single (?=.*?[0-9@#&~]). The point is that the ranges/chars inside the positive character class are "OR'ed", [0-9@#&~] matches either a digit, or @, or #, or &, or ~.

I suggest

^(?=[^A-Z]*[A-Z])(?=[^a-z]*[a-z])(?=[^0-9@#&~]*[0-9@#&~])[A-Za-z0-9@#&~]{8,20}$

See this regex demo

You may also use comment mode or blocks to build a dynamic pattern to make the pattern readable and maintainable:

^                           # start of string
  (?=[^A-Z]*[A-Z])          # string must have an uppercase letter
  (?=[^a-z]*[a-z])          # string must have a lowercase letter
  (?=[^0-9@#&~]*[0-9@#&~])  # string must have a digit or defined special char
  [A-Za-z0-9@#&~]{8,20}     # The string should have 8 to 20 symbols from the defined set
$                           # end of string

The [A-Za-z0-9@#&~] will only allow letters, digits, and special chars you specify in this character class.

This regex also conforms to the principle of contrast (lookaheads fail or match quicker with negated character classes).

回答2:

The simple answer here is don't use a single regex. This will simplify everything:

8 to 20 characters: Every language provides a standard way fetching the string length. Use it and just check the number.
Contains an uppercase letter: Check that it matches [A-Z]. You may need to modify this for internationalization.
Contains a lowercase letter: Check that it matches [a-z]. You may need to modify this for internationalization.
Contains a digit: Check that it matches [0-9].
Contains a special character: Check that it matches [@#&~].
Consists of only allowed characters: Make it match ^[A-Za-z0-9@#&~]+$. (This seems like a dubious requirement, especially if this is for passwords.)

You'll have some extra conditionals around the last two checks to require only one, but that isn't a big deal.

The bottom line is that no one will be able to read a single regex for this. You'll have to document everything that it does, and every developer who touches that regex will either hate you or re-implement it as multiple checks like I've described here. Stop. Seriously. This is "parse HTML with regex" level of bad design. Just use multiple checks. It is the most sane approach.

Most importantly, it will be much easier to add new requirements later on, and you'll have to do it anyway if you run into something that isn't possible to check via regex.

来源：https://stackoverflow.com/questions/37954914/regex-to-accept-3-out-of-4-rules

标签

regex

conditional

match

special-characters