regex | 易学教程

$ Windows newline symbol in Python bytes regex

阅读更多关于 $ Windows newline symbol in Python bytes regex

问题 $ matches at the end of a line, which is defined as either the end of the string, or any location followed by a newline character. However, the Windows newline flag contains two characters '\r\n' , how to make '$' recognize '\r\n' as a newline character in bytes ? Here is what I have: # Python 3.4.2 import re input = b''' //today is a good day \r\n //this is Windows newline style \r\n //unix line style \n ...other binary data... ''' L = re.findall(rb'//.*?$', input, flags = re.DOTALL | re

Regex to match address and optional suffix

阅读更多关于 Regex to match address and optional suffix

问题 I have addresses in two formats: SomeHouse, Holbrook, Belper, Derbyshire, DE56 0RR and SomeHouse, Holbrook, Belper, Derbyshire, DE56 0RR(123123123123) The number only ever appears right at the end, is always in brackets and always 12 digits. I am trying to get a regex to match two groups ... the address and the number (if it is there). It is a head banger (for my inregexperienced self) since i cant get my expression to work on both types of address. I have (?<address>.*)(?<bracketsandnum>\((?

How do I count all occurrences of a phrase in a text file using regular expressions?

阅读更多关于 How do I count all occurrences of a phrase in a text file using regular expressions?

问题 I am reading in multiple files from a directory and attempting to find how many times a specific phrase (in this instance "at least") occurs in each file (not just that it occurs, but how many times in each text file it occurs) My code is as follows import glob import os path = 'D:/Test' k = 0 for filename in glob.glob(os.path.join(path, '*.txt')): if filename.endswith('.txt'): f = open(filename) data = f.read() data.split() data.lower() S = re.findall(r' at least ', data, re.MULTILINE) count

Remove weird symbols from string

阅读更多关于 Remove weird symbols from string

问题 I have vector contains strings in Polish. First 8 are presented below on picture. As you might see strings contains weird symbols. I used dput() to find out hide symbols. The output as follow: > music & more spa<U+0093>a\201ka z ograniczonä<U+0084> odpowiedzialnoa<U+009A>ciä<U+0084> > hemmersbach central support spa3a<U+0082>ka z o.o. sp. k. > grupa kapitaa<U+0082>owa naprza3d", "automex spa3a<U+0082>ka z o.o. spa3a<U+0082>ka k. > spa3a<U+0082>dzielnia usa<U+0082>ugowa vig ekspert, vienna

Remove weird symbols from string

阅读更多关于 Remove weird symbols from string

How do I count all occurrences of a phrase in a text file using regular expressions?

阅读更多关于 How do I count all occurrences of a phrase in a text file using regular expressions?

How to delete lines before a match perserving it?

阅读更多关于 How to delete lines before a match perserving it?

问题 I have the following script to remove all lines before a line which matches with a word: str=' 1 2 3 banana 4 5 6 banana 8 9 10 ' echo "$str" | awk -v pattern=banana ' print_it {print} $0 ~ pattern {print_it = 1} ' It returns: 4 5 6 banana 8 9 10 But I want to include the first match too. This is the desired output: banana 4 5 6 banana 8 9 10 How could I do this? Do you have any better idea with another command? I've also tried sed '0,/^banana$/d' , but seems it only works with files, and I

Regular expression works on regex101.com, but not on prod

阅读更多关于 Regular expression works on regex101.com, but not on prod

问题 https://regex101.com/r/sB9wW6/1 (?:(?<=\s)|^)@(\S+) <-- the problem in positive lookbehind Working like this on prod : (?:\s|^)@(\S+) , but I need a correct start index (without space). Here is in JS: var regex = new RegExp(/(?:(?<=\s)|^)@(\S+)/g); Error parsing regular expression: Invalid regular expression: /(?:(?<=\s)|^)@(\S+)/ What am I doing wrong? UPDATE Ok, no lookbehind in JS :( But anyways, I need a regex to get the proper start and end index of my match. Without leading space. 回答1:

Split camelCase word into words with php preg_match (Regular Expression)

阅读更多关于 Split camelCase word into words with php preg_match (Regular Expression)

问题 How would I go about splitting the word: oneTwoThreeFour into an array so that I can get: one Two Three Four with preg_match ? I tired this but it just gives the whole word $words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`; 回答1: You can also use preg_match_all as: preg_match_all('/((?:^|[A-Z])[a-z]+)/',$str,$matches); Explanation: ( - Start of capturing parenthesis. (?: - Start of non-capturing parenthesis. ^ - Start anchor. | -

whats missing this regex to match the lines of apache logs?

阅读更多关于 whats missing this regex to match the lines of apache logs?

问题 I have these lines 5.10.80.69 - - [21/Jun/2019:15:46:20 -0700] "PATCH /niches/back-end HTTP/2.0" 406 15834 11.57.203.39 - carroll8889 [21/Jun/2019:15:46:21 -0700] "HEAD /visionary/cultivate HTTP/1.1" 404 15391 124.137.187.175 - - [21/Jun/2019:15:46:22 -0700] "DELETE /expedite/exploit/cultivate/web-enabled HTTP/1.0" 403 2606 203.36.55.39 - collins6322 [21/Jun/2019:15:46:23 -0700] "PATCH /efficient/productize/disintermediate HTTP/1.1" 504 13377 175.5.52.40 - - [21/Jun/2019:15:46:24 -0700] "POST