regex

$ Windows newline symbol in Python bytes regex

让人想犯罪 __ 提交于 2021-02-02 08:56:37
问题 $ matches at the end of a line, which is defined as either the end of the string, or any location followed by a newline character. However, the Windows newline flag contains two characters '\r\n' , how to make '$' recognize '\r\n' as a newline character in bytes ? Here is what I have: # Python 3.4.2 import re input = b''' //today is a good day \r\n //this is Windows newline style \r\n //unix line style \n ...other binary data... ''' L = re.findall(rb'//.*?$', input, flags = re.DOTALL | re

Regex to match address and optional suffix

空扰寡人 提交于 2021-02-01 05:11:13
问题 I have addresses in two formats: SomeHouse, Holbrook, Belper, Derbyshire, DE56 0RR and SomeHouse, Holbrook, Belper, Derbyshire, DE56 0RR(123123123123) The number only ever appears right at the end, is always in brackets and always 12 digits. I am trying to get a regex to match two groups ... the address and the number (if it is there). It is a head banger (for my inregexperienced self) since i cant get my expression to work on both types of address. I have (?<address>.*)(?<bracketsandnum>\((?

How do I count all occurrences of a phrase in a text file using regular expressions?

半城伤御伤魂 提交于 2021-01-29 22:44:21
问题 I am reading in multiple files from a directory and attempting to find how many times a specific phrase (in this instance "at least") occurs in each file (not just that it occurs, but how many times in each text file it occurs) My code is as follows import glob import os path = 'D:/Test' k = 0 for filename in glob.glob(os.path.join(path, '*.txt')): if filename.endswith('.txt'): f = open(filename) data = f.read() data.split() data.lower() S = re.findall(r' at least ', data, re.MULTILINE) count

Remove weird symbols from string

社会主义新天地 提交于 2021-01-29 22:36:12
问题 I have vector contains strings in Polish. First 8 are presented below on picture. As you might see strings contains weird symbols. I used dput() to find out hide symbols. The output as follow: > music & more spa<U+0093>a\201ka z ograniczonä<U+0084> odpowiedzialnoa<U+009A>ciä<U+0084> > hemmersbach central support spa3a<U+0082>ka z o.o. sp. k. > grupa kapitaa<U+0082>owa naprza3d", "automex spa3a<U+0082>ka z o.o. spa3a<U+0082>ka k. > spa3a<U+0082>dzielnia usa<U+0082>ugowa vig ekspert, vienna

Remove weird symbols from string

大兔子大兔子 提交于 2021-01-29 22:33:31
问题 I have vector contains strings in Polish. First 8 are presented below on picture. As you might see strings contains weird symbols. I used dput() to find out hide symbols. The output as follow: > music & more spa<U+0093>a\201ka z ograniczonä<U+0084> odpowiedzialnoa<U+009A>ciä<U+0084> > hemmersbach central support spa3a<U+0082>ka z o.o. sp. k. > grupa kapitaa<U+0082>owa naprza3d", "automex spa3a<U+0082>ka z o.o. spa3a<U+0082>ka k. > spa3a<U+0082>dzielnia usa<U+0082>ugowa vig ekspert, vienna

How do I count all occurrences of a phrase in a text file using regular expressions?

我的梦境 提交于 2021-01-29 22:33:14
问题 I am reading in multiple files from a directory and attempting to find how many times a specific phrase (in this instance "at least") occurs in each file (not just that it occurs, but how many times in each text file it occurs) My code is as follows import glob import os path = 'D:/Test' k = 0 for filename in glob.glob(os.path.join(path, '*.txt')): if filename.endswith('.txt'): f = open(filename) data = f.read() data.split() data.lower() S = re.findall(r' at least ', data, re.MULTILINE) count

How to delete lines before a match perserving it?

怎甘沉沦 提交于 2021-01-29 20:50:38
问题 I have the following script to remove all lines before a line which matches with a word: str=' 1 2 3 banana 4 5 6 banana 8 9 10 ' echo "$str" | awk -v pattern=banana ' print_it {print} $0 ~ pattern {print_it = 1} ' It returns: 4 5 6 banana 8 9 10 But I want to include the first match too. This is the desired output: banana 4 5 6 banana 8 9 10 How could I do this? Do you have any better idea with another command? I've also tried sed '0,/^banana$/d' , but seems it only works with files, and I

Regular expression works on regex101.com, but not on prod

旧街凉风 提交于 2021-01-29 20:23:33
问题 https://regex101.com/r/sB9wW6/1 (?:(?<=\s)|^)@(\S+) <-- the problem in positive lookbehind Working like this on prod : (?:\s|^)@(\S+) , but I need a correct start index (without space). Here is in JS: var regex = new RegExp(/(?:(?<=\s)|^)@(\S+)/g); Error parsing regular expression: Invalid regular expression: /(?:(?<=\s)|^)@(\S+)/ What am I doing wrong? UPDATE Ok, no lookbehind in JS :( But anyways, I need a regex to get the proper start and end index of my match. Without leading space. 回答1:

Split camelCase word into words with php preg_match (Regular Expression)

ⅰ亾dé卋堺 提交于 2021-01-29 20:22:13
问题 How would I go about splitting the word: oneTwoThreeFour into an array so that I can get: one Two Three Four with preg_match ? I tired this but it just gives the whole word $words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`; 回答1: You can also use preg_match_all as: preg_match_all('/((?:^|[A-Z])[a-z]+)/',$str,$matches); Explanation: ( - Start of capturing parenthesis. (?: - Start of non-capturing parenthesis. ^ - Start anchor. | -

whats missing this regex to match the lines of apache logs?

狂风中的少年 提交于 2021-01-29 20:02:43
问题 I have these lines 5.10.80.69 - - [21/Jun/2019:15:46:20 -0700] "PATCH /niches/back-end HTTP/2.0" 406 15834 11.57.203.39 - carroll8889 [21/Jun/2019:15:46:21 -0700] "HEAD /visionary/cultivate HTTP/1.1" 404 15391 124.137.187.175 - - [21/Jun/2019:15:46:22 -0700] "DELETE /expedite/exploit/cultivate/web-enabled HTTP/1.0" 403 2606 203.36.55.39 - collins6322 [21/Jun/2019:15:46:23 -0700] "PATCH /efficient/productize/disintermediate HTTP/1.1" 504 13377 175.5.52.40 - - [21/Jun/2019:15:46:24 -0700] "POST