regex

Python beautifulsoup extract value without identifier

蹲街弑〆低调 提交于 2021-02-17 04:47:50
问题 I am facing a problem and don't know how to solve it properly. I want to extract the price (so in the first example 130€, in the second 130€). the problem is that the attributes are changing all the time. so I am unable to do something like this, because I am scraping hundreds of sites and and on each site the first 2 chars of the "id" attribute may differ: tag = soup_expose_html.find('span', attrs={'id' : re.compile(r'(07_content$)')}) Even if I would use something like this it wont work,

extract filename between last slash and question mark

别说谁变了你拦得住时间么 提交于 2021-02-17 04:45:20
问题 I want to extract filename between last slash and question mark using regex I read some related answers ([^/]*$) But i have several domain names so i want to extract filename of specific names and looking for a regex that work for all domains. How can i limit it to certain domains? My target is to replace the domain name, http://old.domain.com/asda/dsdasd/fsdfd/bvc/filename.mp4?fdgfsdgfgfsgf http://new.domain.com/filename.mp4 Sincerely 回答1: You could try preg_replace('/(.*:\/\/).*\/(.*?)(\?.*

How to extract commas in between square brackets in notepad++?

隐身守侯 提交于 2021-02-17 03:39:11
问题 For example: [TEXT1,TEXT2,TEXT3] my expression: [\[].*,.*[\]] Finds strings with commas (in between brackets,) but I only want to explicitly match the comma that exists in the square brackets. I need to replace the commas with spaces - but only in the square brackets. I've tried [\[],[\]] but that doesn't work - \[(.*?)\] will find the text in between as well - but I do not want the entire string. Can anyone suggest what I need to do to just find commas in between the brackets? 回答1: Find what

pandas extract regex allowing mismatches

人盡茶涼 提交于 2021-02-17 03:30:16
问题 Pandas has a very fast and nice string method, extract(). This method works perfectly with a regex such as this one: strict_pattern = r"^(?P<pre_spacer>ACGAG)(?P<UMI>.{9,13})(?P<post_spacer>TGGAGTCT)" test_df R1 21 ACGAGTTTTCGTATTTTTGGAGTCTTGTGG 22 ACGAGTAGGGAGGGGGGTGGAGTCTCAGCG 23 ACGAGGGGGGGGAGGCTGGAGTCTCCGGGT 24 ACGAGAATAACGTTTGGTGGAGTCTACCAC 25 ACGAGGGGAATAAATATTGGAGTCTCCTCC 26 ACGAGATTGGGTATGCTGGAGTCTCTGTTC 27 ACGAGGTACCCGCGCCATGGAGTCTCTCTG 28 ACGAGTGGTTTTTGTCGTGGAGTCTCACCA 29

group name can't start with number?

我怕爱的太早我们不能终老 提交于 2021-02-17 02:56:47
问题 It looks like I can't use regex like this one, (?P<74xxx>[0-9]+) With re package it would raise and error, sre_constants.error: bad character in group name u'74xxx' It looks like I can't use group names that starts with a number, why? P.S golang does not have such problem, so does many other languages 回答1: Given the doc: Group names must be valid Python identifiers As the variables, identifiers mustn't start with a number in Python. See more about identifiers here: identifier ::= (letter|"_")

Regex for a number followed by a word

时光毁灭记忆、已成空白 提交于 2021-02-17 02:48:49
问题 In JavaScript, what would be the regular expression for a number followed by a word? I need to catch the number AND the word and replace them both after some calculation. Here are the conditions in a form of example: 123 dollars => Catch the '123' and the 'dollars'. foo bar 0.2 dollars => 0.2 and dollars foo bar.5 dollar => 5 and dollar (notice the dot before 5) foo bar.5.6 dollar => 5.6 and dollar foo bar.5.6.7 dollar => skip (could be only 0 or 1 dot) foo bar5 dollar => skip foo bar 5dollar

Regex for a number followed by a word

懵懂的女人 提交于 2021-02-17 02:47:45
问题 In JavaScript, what would be the regular expression for a number followed by a word? I need to catch the number AND the word and replace them both after some calculation. Here are the conditions in a form of example: 123 dollars => Catch the '123' and the 'dollars'. foo bar 0.2 dollars => 0.2 and dollars foo bar.5 dollar => 5 and dollar (notice the dot before 5) foo bar.5.6 dollar => 5.6 and dollar foo bar.5.6.7 dollar => skip (could be only 0 or 1 dot) foo bar5 dollar => skip foo bar 5dollar

How to use REGEX to split text to chunks, broken on specific chars?

↘锁芯ラ 提交于 2021-02-17 02:42:29
问题 I wish to split a long text into chunks of 1000 chars max, To take as much chars as I can in each chunk but importantly I want to finish each chunk in a linebreak inorder to avoid word split in the middle. If there was no single linebreak in all of the 1000 chars, then I regex will still capture, and split a word to 2 chunks. This Regex /.{1,1000}/gs will split the text to chunks of 1000 chars but it may break a word in the middle. What Regex will give me the wanted results? 回答1: You can use

Splitting a String by number of delimiters

旧城冷巷雨未停 提交于 2021-02-17 02:40:29
问题 I am trying to split a string into a string array, there might be number of combinations, I tried: String strExample = "A, B"; //possible option are: 1. A,B 2. A, B 3. A , B 4. A ,B String[] parts; parts = strExample.split("/"); //Split the string but doesnt remove the space in between them so the 2 item in the string array is space and B ( B) parts = strExample.split("/| "); parts = strExample.split(",|\\s+"); Any guidance would be appreciated 回答1: To split with comma enclosed with optional

Regex for name extraction on text file

落花浮王杯 提交于 2021-02-17 02:37:06
问题 I've got a plain text file containing a list of authors and abstracts and I'm trying to extract just the author names to use for network analysis. My text follows this pattern and contains 500+ abstracts: 2010 - NUCLEAR FORENSICS OF SPECIAL NUCLEAR MATERIAL AT LOS ALAMOS: THREE RECENT STUDIES Purchase this article David L. Gallimore, Los Alamos National Laboratory Katherine Garduno, Los Alamos National Laboratory Russell C. Keller, Los Alamos National Laboratory Nuclear forensics of special