regex | 易学教程

Regex negative lookbehind in R

阅读更多关于 Regex negative lookbehind in R

问题 I'm trying to do a regex in stringr for a negative lookbehind in R. So basically, I have a text data that looks something like this : See item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data. I want to select everything from the "Item 7" right after the "blahblahblah." sentence to "Item 8-Financial Statements and Supplementary Data" So I want Item 7 Management's Discussion and

Positive lookbehind regex obvious maximum length

阅读更多关于 Positive lookbehind regex obvious maximum length

问题 So I have been experimenting with regex in order to parse the following strings: INFO: Device 6: Time 20.11.2015 06:28:00 - [Script] FunFehlerButton: Execute [0031 text] and INFO: Device 0: Time 09.12.2015 03:51:44 - [Replication] FunFehlerButton: Execute and INFO: Device 6: Time 20.11.2015 06:28:00 - FunFehlerButton: Execute The regex I tried to use are: (?<=\\d{1,2}:\\d{2}:\\d{2} - ).* and (?<=\\[\\w*\\]).* of which the first one runs correctly and the second one lands in a expcetion. My

Positive lookbehind regex obvious maximum length

阅读更多关于 Positive lookbehind regex obvious maximum length

Modify NLTK word_tokenize to prevent tokenization of parenthesis

阅读更多关于 Modify NLTK word_tokenize to prevent tokenization of parenthesis

问题 I have the following main.py . #!/usr/bin/env python # vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8: import nltk import string import sys for token in nltk.word_tokenize(''.join(sys.stdin.readlines())): #print token if len(token) == 1 and not token in string.punctuation or len(token) > 1: print token The output is the following. ./main.py <<< 'EGR1(-/-) mouse embryonic fibroblasts' EGR1 -/- mouse embryonic fibroblasts I want to slightly change the tokenizer so

sre_constants.error: unexpected end of regular expression - Should Work Fine

阅读更多关于 sre_constants.error: unexpected end of regular expression - Should Work Fine

问题 So I'm doing a little bit of testing for something and I require a method of splitting a string into groups of two. (e.g. 'abcdef' => ['ab','cd','ef'] ) I'm trying to use a regex pattern to do this ( [^]{2} ). Whenever I try to compile this pattern, I get the error message: sre_constants.error: unexpected end of regular expression The exact line of code is: pat = re.compile(r'[^]{2}') Could someone please tell me what I'm doing wrong here? I've done a lot of searching but a lot of the

Extract specific words from a text file?

阅读更多关于 Extract specific words from a text file?

问题 I have a text file with over 10,000 lines, each line have a word that starts with the CDID_ followed by 10 more characters with no spaces as below: a <- c("Test CDID_1254WE_1023 Sky","CDID_1254XE01478 Blue","This File named as CDID_ZXASWE_1111") I would like to extract the words that start with CDID_ only to make the lines above look like this: CDID_1254WE_1023 CDID_1254XE01478 CDID_ZXASWE_1111 回答1: Here are three base R options. Option 1: Use sub() , removing everything except the CDID_*

Extract specific words from a text file?

阅读更多关于 Extract specific words from a text file?

AWK set multiple delimiters for comma and quotes with commas

阅读更多关于 AWK set multiple delimiters for comma and quotes with commas

问题 I have a CSV file where columns are comma separated and columns with textual data that have commas are quoted. Sometimes, within quoted text there also exist quotes to mean things like inches resulting in more quotes. Textual data without embedded commas do not have quotes. For example: A,B,C 1,"hello, how are you",hello 2,car,bike 3,13.3 inch tv,"tv 13.3""" How do i use awk to print the number of columns for each row of which i should get 3 3 3 I thought of using $awk -F'[,"]' but im getting

Regular expressions in POS tagged NLTK corpus

阅读更多关于 Regular expressions in POS tagged NLTK corpus

问题 I'm loading a POS-tagged corpus in NLTK, and I would like to find certain patterns involving POS tags. These patterns can be quite complex, including a lot of different combinations of POS tags. Example input string: We/PRP spent/VBD some/DT time/NN reading/NN about/IN the/DT historical/JJ importance/NN of/IN tea/NN in/IN Korea/NNP and/CC China/NNP and/CC then/RB tasted/VBD the/DT most/JJS expensive/JJ green/JJ tea/NN I/PRP have/VBP ever/RB seen/VBN ./. In this case the POS pattern is

javascript search string contains ' + '

阅读更多关于 javascript search string contains ' + '

问题 i would to search a string into another string but i'm facing an issue. There is my code : reference = "project/+bug/1234"; str = "+bug/1234"; alert(reference.search(str)); //it should alert 8 (index of matched strings) but it alert -1 : so, str wasn't found into reference. I've found what's the problem is, and it seems to be the " + " character into str, because .search("string+str") seems to evaluate searched string with the regex " + " 回答1: Just use string.indexOf(). It takes a literal