detecting mistyped email addresses in javascript

元气小坏坏 提交于 2019-12-03 15:45:42

Here's a dirty implementation that could kind of get you some simple checks using the Levenshtein distance. Credit for the "levenshteinenator" goes to this link. You would add whatever popular domains you want to the domains array and it would check to see if the distance of the host part of the email entered is 1 or 2 which would be reasonably close to assume there's a typo somewhere.

levenshteinenator = function(a, b) {
    var cost;

    // get values
    var m = a.length;
    var n = b.length;

    // make sure a.length >= b.length to use O(min(n,m)) space, whatever that is
    if (m < n) {
        var c=a;a=b;b=c;
        var o=m;m=n;n=o;
    }

    var r = new Array();
    r[0] = new Array();
    for (var c = 0; c < n+1; c++) {
        r[0][c] = c;
    }

    for (var i = 1; i < m+1; i++) {
        r[i] = new Array();
        r[i][0] = i;
        for (var j = 1; j < n+1; j++) {
            cost = (a.charAt(i-1) == b.charAt(j-1))? 0: 1;
            r[i][j] = minimator(r[i-1][j]+1,r[i][j-1]+1,r[i-1][j-1]+cost);
        }
    }

    return r[m][n];
}

// return the smallest of the three values passed in
minimator = function(x,y,z) {
    if (x < y && x < z) return x;
    if (y < x && y < z) return y;
    return z;
}

var domains = new Array('yahoo.com','google.com','hotmail.com');
var email = 'whatever@yahoo.om';
var parts = email.split('@');
var dist;
for(var x=0; x < domains.length; x++) {
    dist = levenshteinenator(domains[x], parts[1]);
    if(dist == 1 || dist == 2) {
        alert('did you mean ' + domains[x] + '?');
    }
}

In addition to soundex, you may also want to have a look at algorithms for determining Levenshtein distance.

Stefan

Check out soundex and Difference: If you use ajax you can have the sql-server check the soundex-value of the words against "correct" domains and get suggestions back. It is also possible to make an own version of soundex (its not that complicated).

SQL Server's SoundEx function on non-Latin character sets?

Data structure for soundex algorithm?

How do you implement a "Did you mean"?

Of course, as a first step, you could strip out the domain name and do a DNS lookup - that should at least tell you if it appears to be legitimate.

As other said, the Levenshtein distance is a sure solution.

There is an excellent Javascript library that does exactly what you want: Mailcheck from Kicksend.

https://github.com/DimitarChristoff/mailcheck

The library:

  • offers up suggestions for domains and top level domains.
  • can be customized (domains, top domains, string distance method).
  • can be used with jQuery
  • is decoupled from jQuery

This library uses sift3 string similarity algorithm for speed purpose. It has been reported that Levenshtein distance produces better results (https://github.com/DimitarChristoff/mailcheck).

It might be possible to use a regex, but personally, it would take me way too long to write one I'd be happy with that could get all the possible permutations without causing too many false positives.

So, here's what I would do:

  • Hard-code a list of all the common typing errors.
  • Use a case-insensitive string comparison to compare the email to each string in the list .
  • If there's a match, display a warning - "Did you mean yahoo.com?"

Yeah, it's not very pretty, but it doesn't seem (at least from your question) like you'll have that many to check, so it should perform just fine. It also doesn't seem (at least to me) like something worth putting a whole lot of time into, so this is an incredible simple solution that could be done in about 15-30 min.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!