character-class

Python: POSIX character class in regex?

耗尽温柔 提交于 2019-12-08 16:37:44
问题 How can I search for, say, a sequence of 10 isprint characters in a given string in Python? With GNU grep, I would simply do grep [[:print:]]{10} 回答1: Since POSIX is not supported by Python re module, you have to emulate it with the help of character class. You can use the one from the regular-expressions.info and add a limiting quantifier {10} : [\x20-\x7E]{10} See demo Alternatively, you can use Matthew Barnett regex module that claims to support POSIX character classes ( POSIX character

Reusing a character class in a regular expression

这一生的挚爱 提交于 2019-12-07 09:30:13
问题 In order to keep a regular expression more brief, is there a shorthand way to refer to a character class that occurs earlier in the same regular expression? Example Is there a way to shorten the following: [acegikmoqstz@#&].*[acegikmoqstz@#&].*[acegikmoqstz@#&] 回答1: Keep in mind that regex features are dependant on the language being used. With Java, you can do this: [acegikmoqstz@#&](?:.*[acegikmoqstz@#&]){2} But that's all, with java you can't refer to named subpattern. With PHP you can do

How to match Unicode vowels?

六月ゝ 毕业季﹏ 提交于 2019-12-04 03:07:25
问题 What character class or Unicode property will match any Unicode vowel in Perl? Wrong answer: [aeiouAEIOU] . (sermon here, item #24 in the laundry list) perluniprops mentions vowels only for Hangul and Indic scripts. Let's set aside the question what a vowel is. Yes, i may not be a vowel in some contexts. So, any character that can be a vowel will do. 回答1: There's no such property. $ uniprops --all a U+0061 <a> \N{LATIN SMALL LETTER A} \w \pL \p{LC} \p{L_} \p{L&} \p{Ll} AHex POSIX_XDigit All

Hyphen and underscore not compatible in sed

岁酱吖の 提交于 2019-12-02 11:27:55
I'm having trouble getting sed to recognize both hyphen and underscore in its pattern string. Does anyone know why [a-z|A-Z|0-9|\-|_] in the following example works like [a-z|A-Z|0-9|_] ? $ cat /tmp/sed_undescore_hypen lkjdaslf lkjlsadjfl dfpasdiuy service-type = service-1; jaldkfjlasdjflk address = address1; kldjfladsf lkjdaslf lkjlsadjfl dfasdf service-type = service_1; jaldkfjlasdjflk address = address1; kldjfladsf $ sed 's/.*\(service-type = [a-z|A-Z|0-9|\-|_]*\);.*\(address = .*\);.*/\1 \2/g' /tmp/sed_undescore_hypen lkjdaslf lkjlsadjfl dfpasdiuy service-type = service-1; jaldkfjlasdjflk

How word character is interpreted in character class?

梦想与她 提交于 2019-12-01 11:04:35
\w - stands for [A-Za-z0-9_] Character class But i am not able to understand how it is interpreted inside character class. So when i use [\w-~] let test = (str) => /^[\w-~]+$/.test(str) console.log(test("T|")) it fails for T| but when i am using [A-Za-z0-9_-~] let test = (str) => /^[A-Za-z0-9_-~]+$/.test(str) console.log(test("T|")) it results in true, i am not able to understand how these two expressions are different from each other ? I believe that the main difference between both your examples is the location of your - character. What's happening here is that in this example: let test =

General approach for (equivalent of) “backreferences within character class”?

狂风中的少年 提交于 2019-11-30 17:38:59
In Perl regexes, expressions like \1 , \2 , etc. are usually interpreted as "backreferences" to previously captured groups, but not so when the \1 , \2 , etc. appear within a character class. In the latter case, the \ is treated as an escape character (and therefore \1 is just 1 , etc.). Therefore, if (for example) one wanted to match a string (of length greater than 1) whose first character matches its last character, but does not appear anywhere else in the string, the following regex will not do: /\A # match beginning of string; (.) # match and capture first character (referred to

Hyphen and underscore not compatible in sed

こ雲淡風輕ζ 提交于 2019-11-30 06:09:13
问题 I'm having trouble getting sed to recognize both hyphen and underscore in its pattern string. Does anyone know why [a-z|A-Z|0-9|\-|_] in the following example works like [a-z|A-Z|0-9|_] ? $ cat /tmp/sed_undescore_hypen lkjdaslf lkjlsadjfl dfpasdiuy service-type = service-1; jaldkfjlasdjflk address = address1; kldjfladsf lkjdaslf lkjlsadjfl dfasdf service-type = service_1; jaldkfjlasdjflk address = address1; kldjfladsf $ sed 's/.*\(service-type = [a-z|A-Z|0-9|\-|_]*\);.*\(address = .*\);.*/\1

General approach for (equivalent of) “backreferences within character class”?

只谈情不闲聊 提交于 2019-11-30 01:29:40
问题 In Perl regexes, expressions like \1 , \2 , etc. are usually interpreted as "backreferences" to previously captured groups, but not so when the \1 , \2 , etc. appear within a character class. In the latter case, the \ is treated as an escape character (and therefore \1 is just 1 , etc.). Therefore, if (for example) one wanted to match a string (of length greater than 1) whose first character matches its last character, but does not appear anywhere else in the string, the following regex will

List of metacharacters for MySQL regexp square brackets

ε祈祈猫儿з 提交于 2019-11-29 16:37:45
Strangely I can't seem to find anywhere a list of the characters that I can't safely use as literals within MySQL regular expression square brackets without escaping them or requiring the use of a [:character_class:] thing. (Also the answer probably needs to be MySQL specific because MySQL regular expressions seem to be lacking compared those in Perl/PHP/Javascript etc). Almost all metacharacters (including the dot . , the + , * and ? quantifiers, the end-of-string anchor $ , etc) have no special meaning in character classes, with a few notable exceptions: closing bracket ] , for obvious

Replace all characters not in range (Java String)

两盒软妹~` 提交于 2019-11-28 20:20:14
How do you replace all of the characters in a string that do not fit a criteria. I'm having trouble specifically with the NOT operator. Specifically, I'm trying to remove all characters that are not a digit, I've tried this so far: String number = "703-463-9281"; String number2 = number.replaceAll("[0-9]!", ""); // produces: "703-463-9281" (no change) String number3 = number.replaceAll("[0-9]", ""); // produces: "--" String number4 = number.replaceAll("![0-9]", ""); // produces: "703-463-9281" (no change) String number6 = number.replaceAll("^[0-9]", ""); // produces: "03-463-9281" tangens To