PHP Regex for human names

前端 未结 4 957
故里飘歌
故里飘歌 2020-12-14 22:58

I\'ve run into a bit of a problem with a Regex I\'m using for humans names.

$rexName = \'/^[a-z\' -]$/i\';

Suppose a user with the name Jür

相关标签:
4条回答
  • 2020-12-14 23:17

    If you're trying to parse apart a human name in PHP, I recomment Keith Beckman's nameparse.php script.

    0 讨论(0)
  • 2020-12-14 23:24

    I would really say : don't try to validate names : one day or another, your code will meet a name that it thinks is "wrong"... And how do you think one would react when an application tells him "your name is not valid" ?

    Depending on what you really want to achieve, you might consider using some kind of blacklist / filters, to exclude the "not-names" you thought about : it will maybe let some "bad-names" pass, but, at least, it shouldn't prevent any existing name from accessing your application.

    Here are a few examples of rules that come to mind :

    • no number
    • no special character, like "~{()}@^$%?;:/*§£ø and probably some others
    • no more that 3 spaces ?
    • none of "admin", "support", "moderator", "test", and a few other obvious non-names that people tend to use when they don't want to type in their real name...
      • (but, if they don't want to give you their name, their still won't, even if you forbid them from typing some random letters, they could just use a real name... Which is not their's)

    Yes, this is not perfect ; and yes, it will let some non-names pass... But it's probably way better for your application than saying someone "your name is wrong" (yes, I insist ^^ )


    And, to answer a comment you left under one other answer :

    I could just forbid the most command characters for SQL injection and XSS attacks,

    About SQL Injection, you must escape your data before sending those to the database ; and, if you always escape those data (you should !), you don't have to care about what users may input or not : as it is escaped, always, there is no risk for you.

    Same about XSS : as you always escape your data when ouputting it (you should !), there is no risk of injection ;-)


    EDIT : if you just use that regex like that, it will not work quite well :

    The following code :

    $rexSafety = "/^[^<,\"@/{}()*$%?=>:|;#]*$/i";
    if (preg_match($rexSafety, 'martin')) {
        var_dump('bad name');
    } else {
        var_dump('ok');
    }
    

    Will get you at least a warning :

    Warning: preg_match() [function.preg-match]: Unknown modifier '{'
    

    You must escape at least some of those special chars ; I'll let you dig into PCRE Patterns for more informations (there is really a lot to know about PCRE / regex ; and I won't be able to explain it all)

    If you actually want to check that none of those characters is inside a given piece of data, you might end up with something like that :

    $rexSafety = "/[\^<,\"@\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
    if (preg_match($rexSafety, 'martin')) {
        var_dump('bad name');
    } else {
        var_dump('ok');
    }
    

    (This is a quick and dirty proposition, which has to be refined!)

    This one says "OK" (well, I definitly hope my own name is ok!)
    And the same example with some specials chars, like this :

    $rexSafety = "/[\^<,\"@\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
    if (preg_match($rexSafety, 'ma{rtin')) {
        var_dump('bad name');
    } else {
        var_dump('ok');
    }
    

    Will say "bad name"

    But please note I have not fully tested this, and it probably needs more work ! Do not use this on your site unless you tested it very carefully !


    Also note that a single quote can be helpful when trying to do an SQL Injection... But it is probably a character that is legal in some names... So, just excluding some characters might no be enough ;-)

    0 讨论(0)
  • 2020-12-14 23:25

    That's a problem with no easy general solution. The thing is that you really can't predict what characters a name could possibly contain. Probably the best solution is to define an negative character mask to exclude some special characters you really don't want to end up in a name.

    You can do this using:

    $regexp = "/^[^<put unwanted characters here>]+$/

    0 讨论(0)
  • 2020-12-14 23:28

    PHP’s PCRE implementation supports Unicode character properties that span a larger set of characters. So you could use a combination of \p{L} (letter characters), \p{P} (punctuation characters) and \p{Zs} (space separator characters):

    /^[\p{L}\p{P}\p{Zs}]+$/
    

    But there might be characters that are not covered by these character categories while there might be some included that you don’t want to be allowed.

    So I advice you against using regular expressions on a datum with such a vague range of values like a real person’s name.


    Edit   As you edited your question and now see that you just want to prevent certain code injection attacks: You should better escape those characters rather than rejecting them as a potential attack attempt.

    Use mysql_real_escape_string or prepared statements for SQL queries, htmlspecialchars for HTML output and other appropriate functions for other languages.

    0 讨论(0)
提交回复
热议问题