Can it cause harm to validate email addresses with a regex?

后端 未结 8 876
旧巷少年郎
旧巷少年郎 2020-12-06 03:55

I\'ve heard that it is a bad thing to validate email addresses with a regex, and that it actually can cause harm. Why is that? I thought it never could be a bad thing to val

8条回答
  •  南笙
    南笙 (楼主)
    2020-12-06 04:37

    TL;DR

    Constructing regexes for validating emails can be a good and fun exercise, but in general, you should really avoid it in production code. The proper way of verifying an email address is in most cases to send a verification mail. Trying to verify if a mail address matches the specification is very tricky, and even if you get it right, it's still often useless information unless you know that it's a mail address that you can send mails to and that someone reads. If you're just want to make sure that a user does not mix up input fields, check that the mail address contains a @ character. That's enough.

    Long version

    In a majority of the cases where you would want to use this, just knowing that the email address is valid does not mean a thing. What you really want to know is if it is the right email address. The proper way to verify this is by sending a mail with a verification link.

    If you have verified the email address with a verification link, there's often no point in checking if it is a correct email address, since you know it works. It could however be used for basically checking that the user is entering the email address in the correct field. My advice in this case is to be extremely forgiving. I'd say it's enough to just check that it is a @ in the field. It's a simple check and ALL email addresses includes a @. If you want to make it more complicated than that, I would suggest just warning the user that it might be something wrong with the address, but not forbidding it.

    But one worse concern is that a regex for accurately verifying an email address is actually a very complex matter. If you try to create a regex on your own, you will almost certainly make mistakes. One thing worth mentioning here is that the standard rfc5322 does allow comments within parentheses. To make things worse, nested comments are allowed. A standard regex cannot match nested patterns. You will need extended regex for this. While extended regexes are not unusual, it does say something about the complexity. And even if you get it right, will you update the regex when a new standard comes?

    And one more thing, even if you get it 100% right, that still may not be enough. An email address has the local part on the left side of the @ and domain part on the right. Everything in the local part is meant to be handled by the server. Sure, RFC 5322 is pretty detailed about what a valid local part looks like, but what if a particular email server accepts addresses that is not valid according to rfc5322? Are you really sure you don't want to allow a particular email address that does work just because it does not follow the standard? Do you want to lose customers for your buisness just because they have chosen an obscure email provider?

    If you really want to check if an address is correct in production code, then use MailAddress class or something equivalent. But first take a minute to ponder if this really is what you want. Ask yourself if the address has any value if it is not the correct address. If the answer is no, then you don't. Use verification links instead.

    That being said, it can be a good thing to validate input. The important thing is to know why you are doing it. Validating the email with a regex or (preferably) something like the Mailaddress class could give some protection against malicious input, such as SQL injections and such. But if this is the only method you have to protect you against malicious input, then you're doing something else very wrong.

提交回复
热议问题