exclude underscore from alpha numeric regex

前端 未结 5 1562
轮回少年
轮回少年 2020-12-10 11:47

I want use \\w regex for to allow alpha numeric but I don\'t want underscore _ to be part of it. Since _ is included in \\w

相关标签:
5条回答
  • 2020-12-10 12:21

    Your proposed solution:

    (/^roger\w{2,3}[0-9a-z]/i)
    

    Means:

    \w{2,3} -- 2 or 3 alphanumeric, including the _

    [0-9a-z] (with the /i) -- a single character that is alphanumeric, not including the _

    I didn't see any mention of the acceptable 3 alphanumerics at the beginning. Does that belong?

    Both "roger54" and "roger4a" should fail this because the above regex requires at least three characters following "roger." Likewise, "roger_a" would succeed because "_" passes \w{2,3} (specifically \w{3}).

    Your request sounded like you wanted more of one of these:

    /^roger[0-9a-z]+/i
    /^roger[0-9a-z]*/i
    

    that is, "roger" (case insensitive) followed by one or more (+) or zero or more (*) letters and/or numbers.

    0 讨论(0)
  • 2020-12-10 12:22

    You could try something like:

    [^_\W]+
    
    0 讨论(0)
  • 2020-12-10 12:24

    Assuming the identifier must begin with an alpha character, and then may contain any number of alpha or numeric, I would do this:

    my $string = 'roger54a';
    print "Match\n" if $string =~ m/\A\p{alpha}[\p{alpha}\p{Number}]*\z/;
    

    That anchors to the start and end of the string, precluding any characters that don't match the specific set of a single alpha followed by any quantity of alpha and numerics.

    Update: I see tchrist just gave a great explanation of the Unicode properties. This answer provides the context of a full regexp.

    If you wanted the leading 'alphas' to be two or three digits followed by alpha-numeric, just add the appropriate quantifier:

    $string =~ m/\A\p{alpha}{2,3}[\p{alpha}\p{Number}]*\z/

    Update2: I see a stronger definition of what you're looking for in a comment to one of the answers here. Here's my take on it after seeing your clarification:

    m/\Aroger[\p{alpha}\p{Number}]{2,3}\z/

    0 讨论(0)
  • 2020-12-10 12:38

    I was trying to find a solution to this also and this solution did not work for me in C# when trying to do a regex replace. In case someone else is searching:

    c# Regex.Replace [^\w ] that also removes underscores?

    This is what I use in C#:

    cleaned_string = Regex.Replace(input_string, @"[_]+|[^\w]+]", "");

    If you want to keep spaces:

    cleaned_string = Regex.Replace(input_string, @"[_]+|[^\w\s]+", "");

    0 讨论(0)
  • 2020-12-10 12:41
    • A numeric code point is \pN or \p{Number}.
    • A digit code point is \d, \p{digit}, \p{Nd}, \p{Decimal_Number}, or \p{Numeric_Type=Decimal}.
    • An alphabetic code point is \p{alpha} or \p{Alphabetic}. It includes all \p{Digit}, \p{Letter}, and \p{Letter_Number} code points, as well as certain \p{Mark} and \p{Symbol} code points.
    • A programming-word code point is \w, or [\p{Alphabetic}\p{Digit}\p{Mark}\p{Connector_Punctuation}].

    An alphanumeric code point by the strictest definition is consequently and necessarily [\p{Alphabetic}\p{Number}], typically abbreviated [\p{alpha}\pN].

    0 讨论(0)
提交回复
热议问题