I want use \\w
regex for to allow alpha numeric but I don\'t want underscore _
to be part of it. Since _
is included in \\w
Your proposed solution:
(/^roger\w{2,3}[0-9a-z]/i)
Means:
\w{2,3}
-- 2 or 3 alphanumeric, including the _
[0-9a-z]
(with the /i) -- a single character that is alphanumeric, not including the _
I didn't see any mention of the acceptable 3 alphanumerics at the beginning. Does that belong?
Both "roger54" and "roger4a" should fail this because the above regex requires at least three characters following "roger." Likewise, "roger_a" would succeed because "_" passes \w{2,3} (specifically \w{3}).
Your request sounded like you wanted more of one of these:
/^roger[0-9a-z]+/i
/^roger[0-9a-z]*/i
that is, "roger" (case insensitive) followed by one or more (+) or zero or more (*) letters and/or numbers.
You could try something like:
[^_\W]+
Assuming the identifier must begin with an alpha character, and then may contain any number of alpha or numeric, I would do this:
my $string = 'roger54a';
print "Match\n" if $string =~ m/\A\p{alpha}[\p{alpha}\p{Number}]*\z/;
That anchors to the start and end of the string, precluding any characters that don't match the specific set of a single alpha followed by any quantity of alpha and numerics.
Update: I see tchrist just gave a great explanation of the Unicode properties. This answer provides the context of a full regexp.
If you wanted the leading 'alphas' to be two or three digits followed by alpha-numeric, just add the appropriate quantifier:
$string =~ m/\A\p{alpha}{2,3}[\p{alpha}\p{Number}]*\z/
Update2: I see a stronger definition of what you're looking for in a comment to one of the answers here. Here's my take on it after seeing your clarification:
m/\Aroger[\p{alpha}\p{Number}]{2,3}\z/
I was trying to find a solution to this also and this solution did not work for me in C# when trying to do a regex replace. In case someone else is searching:
c# Regex.Replace [^\w ] that also removes underscores?
This is what I use in C#:
cleaned_string = Regex.Replace(input_string, @"[_]+|[^\w]+]", "");
If you want to keep spaces:
cleaned_string = Regex.Replace(input_string, @"[_]+|[^\w\s]+", "");
\pN
or \p{Number}
.\d
, \p{digit}
, \p{Nd}
, \p{Decimal_Number}
, or \p{Numeric_Type=Decimal}
.\p{alpha}
or \p{Alphabetic}
. It includes all \p{Digit}
, \p{Letter}
, and \p{Letter_Number}
code points, as well as certain \p{Mark}
and \p{Symbol}
code points.\w
, or [\p{Alphabetic}\p{Digit}\p{Mark}\p{Connector_Punctuation}]
.An alphanumeric code point by the strictest definition is consequently and necessarily [\p{Alphabetic}\p{Number}]
, typically abbreviated [\p{alpha}\pN]
.