character-class | 易学教程

Python: POSIX character class in regex?

阅读更多关于 Python: POSIX character class in regex?

问题 How can I search for, say, a sequence of 10 isprint characters in a given string in Python? With GNU grep, I would simply do grep [[:print:]]{10} 回答1: Since POSIX is not supported by Python re module, you have to emulate it with the help of character class. You can use the one from the regular-expressions.info and add a limiting quantifier {10} : [\x20-\x7E]{10} See demo Alternatively, you can use Matthew Barnett regex module that claims to support POSIX character classes ( POSIX character

Reusing a character class in a regular expression

阅读更多关于 Reusing a character class in a regular expression

问题 In order to keep a regular expression more brief, is there a shorthand way to refer to a character class that occurs earlier in the same regular expression? Example Is there a way to shorten the following: [acegikmoqstz@#&].*[acegikmoqstz@#&].*[acegikmoqstz@#&] 回答1: Keep in mind that regex features are dependant on the language being used. With Java, you can do this: [acegikmoqstz@#&](?:.*[acegikmoqstz@#&]){2} But that's all, with java you can't refer to named subpattern. With PHP you can do

How to match Unicode vowels?

阅读更多关于 How to match Unicode vowels?

问题 What character class or Unicode property will match any Unicode vowel in Perl? Wrong answer: [aeiouAEIOU] . (sermon here, item #24 in the laundry list) perluniprops mentions vowels only for Hangul and Indic scripts. Let's set aside the question what a vowel is. Yes, i may not be a vowel in some contexts. So, any character that can be a vowel will do. 回答1: There's no such property. $ uniprops --all a U+0061 <a> \N{LATIN SMALL LETTER A} \w \pL \p{LC} \p{L_} \p{L&} \p{Ll} AHex POSIX_XDigit All

Hyphen and underscore not compatible in sed

阅读更多关于 Hyphen and underscore not compatible in sed

I'm having trouble getting sed to recognize both hyphen and underscore in its pattern string. Does anyone know why [a-z|A-Z|0-9|\-|_] in the following example works like [a-z|A-Z|0-9|_] ? $ cat /tmp/sed_undescore_hypen lkjdaslf lkjlsadjfl dfpasdiuy service-type = service-1; jaldkfjlasdjflk address = address1; kldjfladsf lkjdaslf lkjlsadjfl dfasdf service-type = service_1; jaldkfjlasdjflk address = address1; kldjfladsf $ sed 's/.*$service-type = [a-z|A-Z|0-9|\-|_]*$;.*$address = .*$;.*/\1 \2/g' /tmp/sed_undescore_hypen lkjdaslf lkjlsadjfl dfpasdiuy service-type = service-1; jaldkfjlasdjflk

How word character is interpreted in character class?

阅读更多关于 How word character is interpreted in character class?

\w - stands for [A-Za-z0-9_] Character class But i am not able to understand how it is interpreted inside character class. So when i use [\w-~] let test = (str) => /^[\w-~]+$/.test(str) console.log(test("T|")) it fails for T| but when i am using [A-Za-z0-9_-~] let test = (str) => /^[A-Za-z0-9_-~]+$/.test(str) console.log(test("T|")) it results in true, i am not able to understand how these two expressions are different from each other ? I believe that the main difference between both your examples is the location of your - character. What's happening here is that in this example: let test =

General approach for (equivalent of) “backreferences within character class”?

阅读更多关于 General approach for (equivalent of) “backreferences within character class”?

In Perl regexes, expressions like \1 , \2 , etc. are usually interpreted as "backreferences" to previously captured groups, but not so when the \1 , \2 , etc. appear within a character class. In the latter case, the \ is treated as an escape character (and therefore \1 is just 1 , etc.). Therefore, if (for example) one wanted to match a string (of length greater than 1) whose first character matches its last character, but does not appear anywhere else in the string, the following regex will not do: /\A # match beginning of string; (.) # match and capture first character (referred to

Hyphen and underscore not compatible in sed

阅读更多关于 Hyphen and underscore not compatible in sed

问题 I'm having trouble getting sed to recognize both hyphen and underscore in its pattern string. Does anyone know why [a-z|A-Z|0-9|\-|_] in the following example works like [a-z|A-Z|0-9|_] ? $ cat /tmp/sed_undescore_hypen lkjdaslf lkjlsadjfl dfpasdiuy service-type = service-1; jaldkfjlasdjflk address = address1; kldjfladsf lkjdaslf lkjlsadjfl dfasdf service-type = service_1; jaldkfjlasdjflk address = address1; kldjfladsf $ sed 's/.*$service-type = [a-z|A-Z|0-9|\-|_]*$;.*$address = .*$;.*/\1

General approach for (equivalent of) “backreferences within character class”?

阅读更多关于 General approach for (equivalent of) “backreferences within character class”?

问题 In Perl regexes, expressions like \1 , \2 , etc. are usually interpreted as "backreferences" to previously captured groups, but not so when the \1 , \2 , etc. appear within a character class. In the latter case, the \ is treated as an escape character (and therefore \1 is just 1 , etc.). Therefore, if (for example) one wanted to match a string (of length greater than 1) whose first character matches its last character, but does not appear anywhere else in the string, the following regex will

List of metacharacters for MySQL regexp square brackets

阅读更多关于 List of metacharacters for MySQL regexp square brackets

Strangely I can't seem to find anywhere a list of the characters that I can't safely use as literals within MySQL regular expression square brackets without escaping them or requiring the use of a [:character_class:] thing. (Also the answer probably needs to be MySQL specific because MySQL regular expressions seem to be lacking compared those in Perl/PHP/Javascript etc). Almost all metacharacters (including the dot . , the + , * and ? quantifiers, the end-of-string anchor $ , etc) have no special meaning in character classes, with a few notable exceptions: closing bracket ] , for obvious

Replace all characters not in range (Java String)

阅读更多关于 Replace all characters not in range (Java String)

How do you replace all of the characters in a string that do not fit a criteria. I'm having trouble specifically with the NOT operator. Specifically, I'm trying to remove all characters that are not a digit, I've tried this so far: String number = "703-463-9281"; String number2 = number.replaceAll("[0-9]!", ""); // produces: "703-463-9281" (no change) String number3 = number.replaceAll("[0-9]", ""); // produces: "--" String number4 = number.replaceAll("![0-9]", ""); // produces: "703-463-9281" (no change) String number6 = number.replaceAll("^[0-9]", ""); // produces: "03-463-9281" tangens To