regex

How to replace \r\n characters in a text string specifically in R

房东的猫 提交于 2021-02-11 14:38:33
问题 For the life of me, I am unable to strip out some escape characters from a text string (prior to further processing). I've tried stringi, gsub, but I just cannot get the correct syntax. Here is my text string txt <- "c(\"\\r\\n Stuff from a webpage: That I scraped using webcrawler\\r\\n\", \"\\r\\n \", \"\\r\\n \", \"\\r\\n \", \"\\r\\n\\r\\n \", \"\\r\\n\\r\\n \", \"\\r\\n \\r\\n \", \"\\r\\n \")" I'd like to strip out "\\r\\n" from this string. I've tried gsub("[\\\r\\\n]", "", txt) (leaves

How to know which option in Regex was matched?

我只是一个虾纸丫 提交于 2021-02-11 14:32:24
问题 I have a Regex that match multiple option. E.g. ^0x[\da-fA-F]+|-?\d+$ -- a Regex for Match either decimal or hex literals Is there an option to know which option was eventually match the pattern? so for... -10 - the decimal option was matched 0x1Af - the hex option was matched 回答1: I think you meant this regex: ^(?:-?\d+|0x[\da-fA-F]+)$ with the start and end anchors not part of the alternatives. You can capture the different alternatives: ^(?:(-?\d+)|(0x[\da-fA-F]+))$ Now you just need to

Selenium Regex: Match literal string with dynamic date

元气小坏坏 提交于 2021-02-11 14:30:47
问题 I'm writing some Selenium tests and need to confirm the presence of a text string that has dynamic date and currency components. Example: "This is the date dd/dd/dd and this is the amount $ddd.dd." Is this possible with the Selenium regex implementation? Thanks, Richard 回答1: If I'm not mistaken selenium supports the full power of javascript Date dd/mm/yyyy 01/01/1900 through 31/12/2099 Matches invalid dates such as February 31st Accepts dashes, spaces, forward slashes and dots as date

Selenium Regex: Match literal string with dynamic date

半世苍凉 提交于 2021-02-11 14:28:23
问题 I'm writing some Selenium tests and need to confirm the presence of a text string that has dynamic date and currency components. Example: "This is the date dd/dd/dd and this is the amount $ddd.dd." Is this possible with the Selenium regex implementation? Thanks, Richard 回答1: If I'm not mistaken selenium supports the full power of javascript Date dd/mm/yyyy 01/01/1900 through 31/12/2099 Matches invalid dates such as February 31st Accepts dashes, spaces, forward slashes and dots as date

regex for catching abbreviations

雨燕双飞 提交于 2021-02-11 14:27:49
问题 I am trying to make a regex that matches abbreviations and their full forms in a string. I have a regex that catches some cases but on the example below, it catches more words than it should. Could anyone please help me fix this? x = 'Confirmatory factor analysis (CFA) is a special case of what is known as structural equation modelling (SEM).' re.findall(r'\b([A-Za-z][a-z]+(?:\s[A-Za-z][a-z]+)+)\s+\(([A-Z][A-Z]*[A-Z]\b\.?)',x) out: [('Confirmatory factor analysis', 'CFA'), ('special case of

Java split string with regex, anything inside Double Quotes [duplicate]

こ雲淡風輕ζ 提交于 2021-02-11 13:58:56
问题 This question already has answers here : Regex for splitting a string using space when not surrounded by single or double quotes (15 answers) Closed 10 months ago . I am trying to split a string such as String s = "do not split this \"split this\""; String[] split = s.split("(?<=\\s)| (?=\") | ((?=[^A-Za-z0-9])|(?<=[^A-Za-z0-9])); will give me ["do", " ", "not", " ", "split", "this", " ", "split this"]; I would like to keep all words, white spaces as well, but ignore anything inside double

Regular Expression R: Select the above or below lines of a regexp selection while meeting another regexp criteria

无人久伴 提交于 2021-02-11 13:58:28
问题 I am working with a text document similar to the examples below. File <- c("Location Name Code and Label Frequency Percentage", " During the past 30 days, on how many days did you carry a weapon", "44-44 Q13 such as a gun, knife, or club on school property?", " 1 0 days 1,610 94.5", " 2 1 day 71 4.3", " 3 2 or 3 days 6 0.4", " 4 4 or 5 days 3 0.2", " 5 6 or more days 12 0.7", " Missing 48", "45-45 Q14 During the past 12 months, on how many days did you carry a gun?", " 1 0 days 1,602 91.3", "

Java split string with regex, anything inside Double Quotes [duplicate]

爷,独闯天下 提交于 2021-02-11 13:57:32
问题 This question already has answers here : Regex for splitting a string using space when not surrounded by single or double quotes (15 answers) Closed 10 months ago . I am trying to split a string such as String s = "do not split this \"split this\""; String[] split = s.split("(?<=\\s)| (?=\") | ((?=[^A-Za-z0-9])|(?<=[^A-Za-z0-9])); will give me ["do", " ", "not", " ", "split", "this", " ", "split this"]; I would like to keep all words, white spaces as well, but ignore anything inside double

Regular Expression R: Select the above or below lines of a regexp selection while meeting another regexp criteria

岁酱吖の 提交于 2021-02-11 13:57:24
问题 I am working with a text document similar to the examples below. File <- c("Location Name Code and Label Frequency Percentage", " During the past 30 days, on how many days did you carry a weapon", "44-44 Q13 such as a gun, knife, or club on school property?", " 1 0 days 1,610 94.5", " 2 1 day 71 4.3", " 3 2 or 3 days 6 0.4", " 4 4 or 5 days 3 0.2", " 5 6 or more days 12 0.7", " Missing 48", "45-45 Q14 During the past 12 months, on how many days did you carry a gun?", " 1 0 days 1,602 91.3", "

Regex - non-contiguous range of repetitions

一个人想着一个人 提交于 2021-02-11 13:24:35
问题 I am trying to build a regex that matches a pattern a certain number of times, e.g. 3 or 5. [a-z]{3,5} will match [a-z] 3, 4 or 5 times, but I don't want the 4. I know I could do something like ([a-z]{3})([a-z]{2})? , but that means that for cases where I want to match the pattern 3, 5, 7, 13 or 29 times, the resulting regex would be particularly nasty. Is there any better way to do this? (I used [a-z] as an example, but it could be anything else) 回答1: Regular expressions don't support an