Exact verification of email addresses by regular expression

    As everyone knows, one of the most convenient ways to check e-mail addresses is regular expressions. Recently I had to deal with the problem of the most accurate address verification. This check was necessary in the system of automatic spamming of questionnaires, where each list of addresses was automatically loaded with one large file. It was required to exclude the maximum number of obviously invalid addresses.
    The problem was that all the email verification templates that can be found on the Internet, MSDN and other sources did not meet the verification requirements. Turning to the original sources in the form of RFC 2821 and RFC 2821, I found out how exactly and correctly validate addresses.


    Email address = local part @ domain part

    Local part



    Characters allowed in local part:
    • +
    • -
    • . (except for local..part - two points in a row, .localPart - a point at the beginning and localPart. - a point at the end)
    • 0-9
    • AZ, az
    • -


    Characters NOT allowed in local part
    • !
    • #
    • $
    • %
    • (
    • )
    • ,
    • :
    • ;
    • <
    • >
    • [
    • \
    • ]
    • ''
    • |
    • SPACE, DEL, Control chars


    Characters that are undesirable to use in the local part, but which may be present. (Requires testing if their server accepts).
    • ?
    • ''
    • *
    • /
    • =
    • ?
    • ^
    • {
    • }
    • ~

    The reason you shouldn't use them in addresses is because many belong to the UNIX shell special characters group of characters.

    Domain part



    - it can be either in the form of an IP address, an IP address with a port, or just a literal expression containing only lowercase and uppercase Latin letters and a dash ('-', but there is a limitation: the dash cannot be either at the end or at the beginning ; nothing is said about the restriction on two dashes in a row), separated by dots. Accordingly, the expression domain..com is invalid.

    As a result, having modified one of the Internet templates, I got:

    ^ [a-zA-Z0-9 _ '+ * / ^ & =? ~ {} \ -] (\.? [A-zA-Z0-9 _' + * / ^ & =? ~ {} \ -]) * \ @ ((\ d {1,3} \. \ d {1,3} \. \ d {1,3} \. \ d {1,3} (\ : \ d {1,3})?) | ((((([a-zA-Z0-9] [a-zA-Z0-9 \ -] + [a-zA-Z0-9]) | ([ a-zA-Z0-9] {1,2})) [\.] {1}) + ([a-zA-Z] {2,6}))) $

    References:
    RFC 2821: www.remote. org / jochen / rfc / rfc821.txt
    RFC 2822: www.remote.org/jochen/rfc/rfc822.txt

    List of valid / invalid characters: www.remote.org/jochen/mail/info/chars.html

    If the regular expression is incomplete or in some cases incorrect, wishes and comments are welcome.
    I would like to emphasize that this is a special case in which the use of a serious and accurate check was required (as the client required). In other cases, you can not bother :)

    Also popular now: