I want to match a string to make sure it contains only letters.
I\'ve got this and it works just fine:
var onlyLetters = /^[a-zA-Z]*$/.test(myString)
When I tried to implement @Debilski's solution JavaScript didn't like the extended Latin characters -- I had to code them as JavaScript escapes:
// The huge unicode escape string is equal to ÆÐƎƏƐƔIJŊŒẞÞǷȜæðǝəɛɣijŋœĸſßþƿȝĄƁÇĐƊĘĦ
// ĮƘŁØƠŞȘŢȚŦŲƯY̨Ƴąɓçđɗęħįƙłøơşșţțŧųưy̨ƴÁÀÂÄǍĂĀÃÅǺĄÆǼǢƁĆĊĈČÇĎḌĐƊÐÉÈĖÊËĚĔĒĘẸƎ
// ƏƐĠĜǦĞĢƔáàâäǎăāãåǻąæǽǣɓćċĉčçďḍđɗðéèėêëěĕēęẹǝəɛġĝǧğģɣĤḤĦIÍÌİÎÏǏĬĪĨĮỊ
// IJĴĶƘĹĻŁĽĿʼNŃN̈ŇÑŅŊÓÒÔÖǑŎŌÕŐỌØǾƠŒĥḥħıíìiîïǐĭīĩįịijĵķƙĸĺļłľŀʼnńn̈ňñ
// ņŋóòôöǒŏōõőọøǿơœŔŘŖŚŜŠŞȘṢẞŤŢṬŦÞÚÙÛÜǓŬŪŨŰŮŲỤƯẂẀŴẄǷÝỲŶŸȲỸƳŹŻŽẒŕřŗſśŝšşșṣßťţṭ
// ŧþúùûüǔŭūũűůųụưẃẁŵẅƿýỳŷÿȳỹƴźżžẓ
function isAlpha(string) {
var patt = /^[a-zA-Z\u00C6\u00D0\u018E\u018F\u0190\u0194\u0132\u014A\u0152\u1E9E\u00DE\u01F7\u021C\u00E6\u00F0\u01DD\u0259\u025B\u0263\u0133\u014B\u0153\u0138\u017F\u00DF\u00FE\u01BF\u021D\u0104\u0181\u00C7\u0110\u018A\u0118\u0126\u012E\u0198\u0141\u00D8\u01A0\u015E\u0218\u0162\u021A\u0166\u0172\u01AFY\u0328\u01B3\u0105\u0253\u00E7\u0111\u0257\u0119\u0127\u012F\u0199\u0142\u00F8\u01A1\u015F\u0219\u0163\u021B\u0167\u0173\u01B0y\u0328\u01B4\u00C1\u00C0\u00C2\u00C4\u01CD\u0102\u0100\u00C3\u00C5\u01FA\u0104\u00C6\u01FC\u01E2\u0181\u0106\u010A\u0108\u010C\u00C7\u010E\u1E0C\u0110\u018A\u00D0\u00C9\u00C8\u0116\u00CA\u00CB\u011A\u0114\u0112\u0118\u1EB8\u018E\u018F\u0190\u0120\u011C\u01E6\u011E\u0122\u0194\u00E1\u00E0\u00E2\u00E4\u01CE\u0103\u0101\u00E3\u00E5\u01FB\u0105\u00E6\u01FD\u01E3\u0253\u0107\u010B\u0109\u010D\u00E7\u010F\u1E0D\u0111\u0257\u00F0\u00E9\u00E8\u0117\u00EA\u00EB\u011B\u0115\u0113\u0119\u1EB9\u01DD\u0259\u025B\u0121\u011D\u01E7\u011F\u0123\u0263\u0124\u1E24\u0126I\u00CD\u00CC\u0130\u00CE\u00CF\u01CF\u012C\u012A\u0128\u012E\u1ECA\u0132\u0134\u0136\u0198\u0139\u013B\u0141\u013D\u013F\u02BCN\u0143N\u0308\u0147\u00D1\u0145\u014A\u00D3\u00D2\u00D4\u00D6\u01D1\u014E\u014C\u00D5\u0150\u1ECC\u00D8\u01FE\u01A0\u0152\u0125\u1E25\u0127\u0131\u00ED\u00ECi\u00EE\u00EF\u01D0\u012D\u012B\u0129\u012F\u1ECB\u0133\u0135\u0137\u0199\u0138\u013A\u013C\u0142\u013E\u0140\u0149\u0144n\u0308\u0148\u00F1\u0146\u014B\u00F3\u00F2\u00F4\u00F6\u01D2\u014F\u014D\u00F5\u0151\u1ECD\u00F8\u01FF\u01A1\u0153\u0154\u0158\u0156\u015A\u015C\u0160\u015E\u0218\u1E62\u1E9E\u0164\u0162\u1E6C\u0166\u00DE\u00DA\u00D9\u00DB\u00DC\u01D3\u016C\u016A\u0168\u0170\u016E\u0172\u1EE4\u01AF\u1E82\u1E80\u0174\u1E84\u01F7\u00DD\u1EF2\u0176\u0178\u0232\u1EF8\u01B3\u0179\u017B\u017D\u1E92\u0155\u0159\u0157\u017F\u015B\u015D\u0161\u015F\u0219\u1E63\u00DF\u0165\u0163\u1E6D\u0167\u00FE\u00FA\u00F9\u00FB\u00FC\u01D4\u016D\u016B\u0169\u0171\u016F\u0173\u1EE5\u01B0\u1E83\u1E81\u0175\u1E85\u01BF\u00FD\u1EF3\u0177\u00FF\u0233\u1EF9\u01B4\u017A\u017C\u017E\u1E93]+$/;
return patt.test(string);
}
You can't do this in JS. It has a very limited regex and normalizer support. You would need to construct a lengthy and unmaintainable character array with all possible latin characters with diacritical marks (I guess there are around 500 different ones). Rather delegate the validation task to the server side which uses another language with more regex capabilties, if necessary with help of ajax.
In a full fledged regex environment you could just test if the string matches \p{L}+
. Here's a Java example:
boolean valid = string.matches("\\p{L}+");
Alternatively, you could also normailze the text to get rid of the diacritical marks and check if it contains [A-Za-z]+
only. Here's again a Java example:
string = Normalizer.normalize(string, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
boolean valid = string.matches("[A-Za-z]+");
PHP supports similar functions.
I'm using a convertor before checking, but it's still not friendly for all languages. I'm not sure that's possible.
function noExtendedChars( input_name ){
var whitelist = [
['a', 'à','á','â','ä','æ','ã','å','ā'],
['c', 'ç', 'ć', 'č'],
['e', 'è','é','ê','ë','ē','ė','ę'],
['i', 'ï','ï','í','ī','į','î'],
['l', 'ł'],
['n', 'ñ', 'ń'],
['o', 'ô', 'ö', 'ò', 'ó', 'œ', 'ø', 'ō', 'õ' ],
['s', 'ß', 'ś', 'š' ],
['u', 'û', 'ü', 'ù', 'ú', 'ū'],
['y', 'ÿ'],
['z', 'ž', 'ź', 'ż']
];
for( b=0; b < blacklist.length; b++ ){
var r= blacklist[b];
for ( a=1; a < r.length; a++ ){
input_name = input_name.replace( new RegExp( r[a], "gi") , r[0]);
}
}
return input_name;
}
I don’t know the actual reason for doing this, but if you want to use it as a pre-check for, say, login names oder user nicknames, I’d suggest you enter the characters yourself and don’t use the whole ‘alpha’ characters you’ll find in unicode, because you probably won’t find an optical difference in the following letters:
А ≠ A ≠ Α # cyrillic, latin, greek
In such cases it’s better to specify the allowed letters manually if you want to minimise account faking and such.
Addition
Well, if it’s for a field which is supposed to be non-unique, I would allow greek as well. I wouldn’t feel well when I force users into changing their name to a latinised version.
But for unique fields like nicknames you need to give your other visitors of the site a hint, that it’s really the nickname they think it is. Bad enough that people will fake accounts with interchanging I and l already. Of course, it’s something that depends on your users; but to be sure I think it’s better to allow basic latin + diacritics only. (Maybe have a look at this list: Latin-derived_alphabet)
As an untested suggestion (with ‘-’, ‘_’ and ‘ ’):
/^[a-zA-Z\-_ ’'‘ÆÐƎƏƐƔIJŊŒẞÞǷȜæðǝəɛɣijŋœĸſßþƿȝĄƁÇĐƊĘĦĮƘŁØƠŞȘŢȚŦŲƯY̨Ƴąɓçđɗęħįƙłøơşșţțŧųưy̨ƴÁÀÂÄǍĂĀÃÅǺĄÆǼǢƁĆĊĈČÇĎḌĐƊÐÉÈĖÊËĚĔĒĘẸƎƏƐĠĜǦĞĢƔáàâäǎăāãåǻąæǽǣɓćċĉčçďḍđɗðéèėêëěĕēęẹǝəɛġĝǧğģɣĤḤĦIÍÌİÎÏǏĬĪĨĮỊIJĴĶƘĹĻŁĽĿʼNŃN̈ŇÑŅŊÓÒÔÖǑŎŌÕŐỌØǾƠŒĥḥħıíìiîïǐĭīĩįịijĵķƙĸĺļłľŀʼnńn̈ňñņŋóòôöǒŏōõőọøǿơœŔŘŖŚŜŠŞȘṢẞŤŢṬŦÞÚÙÛÜǓŬŪŨŰŮŲỤƯẂẀŴẄǷÝỲŶŸȲỸƳŹŻŽẒŕřŗſśŝšşșṣßťţṭŧþúùûüǔŭūũűůųụưẃẁŵẅƿýỳŷÿȳỹƴźżžẓ]$/.test(myString)
Another edit: I have added the apostrophe for people with names like O’Neill or O’Reilly. (And the straight and the reversed apostrophe for people who can’t enter the curly one correctly.)
var onlyLetters = /^[a-zA-Z\u00C0-\u00ff]+$/.test(myString)
I don't know anything about Javascript, but if it has proper unicode support, convert your string to a decomposed form, then remove the diacritics from it ([\u0300-\u036f\u1dc0-\u1dff]
). Then your letters will only be ASCII ones.