C# Email Regular Expression — Any out there that adhere to the RFC 2822 guidelines?

后端 未结 4 1680
一个人的身影
一个人的身影 2021-01-01 02:43

I realize that there are a ton of regex email validations, but I can\'t seem to find one that adheres to the RFC 2822 standard.

The ones I find keep letting in junk

4条回答
  •  轮回少年
    2021-01-01 03:19

    I did a post on this a short while ago. Yes, it is possible using .NET regex, since they have a non-regular feature called "balancing groups".

    The Perl RFC822 one that is often posted doesn't fully match email addresses, since it requires preprocessing to remove comments. It's also for a very old RFC (from 1982!).

    This regex is for RFC5322, which is current. It also handles all comments and folding whitespace correctly.

    Here is the regex:

    ^(?'localPart'((((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u
    0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u
    000e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|
    \\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c
    \u000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t
    ]+)?|((\r\n)[ \t]+)+))*?(([a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)|(
    "(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\u0021\u0023-\u
    005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u
    007f])|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000
    b\u000c\u000e-\u001f\u007f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n
    )[ \t]+)+)?"))((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u00
    27\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u00
    0e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\
    ([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u
    000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+
    )?|((\r\n)[ \t]+)+))*?)(\.(((\((((?'paren'\()|(?'-paren'\))|
    ([\u0021-\u0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u0
    00b\u000c\u000e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n
    )[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\
    u000b\u000c\u000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+
    ((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?(([a-zA-Z0-9!#$%&'*+/=?^_
    `{|}~-]+)|("(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\u00
    21\u0023-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u00
    0e-\u001f\u007f])|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-
    \u0008\u000b\u000c\u000e-\u001f\u007f])))*([ \t]+((\r\n)[ \t
    ]+)?|((\r\n)[ \t]+)+)?"))((\((((?'paren'\()|(?'-paren'\))|([
    \u0021-\u0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000
    b\u000c\u000e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[
    \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u0
    00b\u000c\u000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((
    \r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?))*))@(?'domain'((((\((((?'
    paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\
    u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t
    ]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|
    [\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(?
    (paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?(
    ([a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)|("(([ \t]+((\r\n)[ \t]+)?|
    ((\r\n)[ \t]+)+)?(([\u0021\u0023-\u005b\u005d-\u007e]|[\u000
    1-\u0008\u000b\u000c\u000e-\u001f\u007f])|\\([\u0021-\u007e]
    |[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007
    f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?"))((\((((?'pa
    ren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\u0
    07e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t]+
    ((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\
    r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(?(p
    aren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?)(\
    .(((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u0
    05b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u0
    07f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u0
    07e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\
    u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[
    \t]+)+))*?(([a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)|("(([ \t]+((\r\
    n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\u0021\u0023-\u005b\u005d-\u0
    07e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|\\([\u0
    021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-
    \u001f\u007f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?"))
    ((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005
    b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007
    f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007
    e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u0
    07f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t
    ]+)+))*?))*)|(((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u00
    27\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u00
    0e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\
    ([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u
    000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+
    )?|((\r\n)[ \t]+)+))*?\[(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]
    +)+)?([!-Z^-~]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f
    ]))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?\]((\((((?'paren
    '\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\u007e
    ]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t]+((\
    r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n
    \0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(?(pare
    n)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?))\z
    

    Some caveats, however. RFC5322 is more liberal with domain names than the actual domain RFCs, and there are other restrictions that apply from various RFCs such as the actual SMTP RFC itself (which specifies a maximum length). So even though an email is correct according to 5322 it can be invalid by various other measures.

    The golden test is still to send an email to the address with a validation code.

提交回复
热议问题