pypi-regex

How can I find the best fuzzy string match?

被刻印的时光 ゝ 提交于 2020-01-02 07:09:14
问题 Python's new regex module supports fuzzy string matching. Sing praises aloud (now). Per the docs: The ENHANCEMATCH flag makes fuzzy matching attempt to improve the fit of the next match that it finds. The BESTMATCH flag makes fuzzy matching search for the best match instead of the next match The ENHANCEMATCH flag is set using (?e) as in regex.search("(?e)(dog){e<=1}", "cat and dog")[1] returns "dog" but there's nothing on actually setting the BESTMATCH flag. How's it done? 回答1: Documentation

Creating fuzzy matching exceptions with Python's new regex module

孤者浪人 提交于 2019-12-19 09:41:18
问题 I'm testing the new python regex module, which allows for fuzzy string matching, and have been impressed with its capabilities so far. However, I've been having trouble making certain exceptions with fuzzy matching. The following is a case in point. I want ST LOUIS , and all variations of ST LOUIS within an edit distance of 1 to match ref . However, I want to make one exception to this rule: the edit cannot consist of an insertion to the left of the leftmost character containing the letters N

Ambiguous substring with mismatches

南楼画角 提交于 2019-12-11 07:01:27
问题 I'm trying to use regular expressions to find a substring in a string of DNA. This substring has ambiguous bases, that like ATCGR , where R could be A or G . Also, the script must allow x number of mismatches. So this is my code import regex s = 'ACTGCTGAGTCGT' regex.findall(r"T[AG]T"+'{e<=1}', s, overlapped=True) So, with one mismatch I would expect 3 substrings AC**TGC**TGAGTCGT and ACTGC**TGA**GTCGT and ACTGCTGAGT**CGT** . The expected result should be like this: ['TGC', 'TGA', 'AGT', 'CGT

How can I find the best fuzzy string match?

丶灬走出姿态 提交于 2019-12-05 23:37:07
Python's new regex module supports fuzzy string matching. Sing praises aloud (now). Per the docs: The ENHANCEMATCH flag makes fuzzy matching attempt to improve the fit of the next match that it finds. The BESTMATCH flag makes fuzzy matching search for the best match instead of the next match The ENHANCEMATCH flag is set using (?e) as in regex.search("(?e)(dog){e<=1}", "cat and dog")[1] returns "dog" but there's nothing on actually setting the BESTMATCH flag. How's it done? Documentation on the BESTMATCH flag functionality is partial (but improving). Poke-n-hope shows that BESTMATCH is set

Fuzzy regex (e.g. {e<=2}) correct usage in Python

本小妞迷上赌 提交于 2019-12-04 03:43:29
问题 I am trying to find strings which are at most two mistakes 'away' from the original pattern string (i.e. they differ by at most two letters). However, the following code isn't working as I would expect, at least not from my understanding of fuzzy regex: import regex res = regex.findall("(ATAGGAGAAGATGATGTATA){e<=2}", "ATAGAGCAAGATGATGTATA", overlapped=True) print res >> ['ATAGAGCAAGATGATGTATA'] # the second string As you can see, the two strings differ on three letters rather than at most two

compiling a fuzzy regexp with python regex

时光毁灭记忆、已成空白 提交于 2019-12-01 06:59:59
问题 When I found out that the python regex module allows fuzzy matching I was increasingly happy as it seemed as a simple solution to many of my problems. But now I am having a problem for which I did not find any answers from documentation. How could I compile Strings into regexps using also the new fuzziness value feature? To illustrate my usual needs and give a sample a little piece of code import regex f = open('liner.fa', 'r') nosZ2f='TTCCGACTACCAAGGCAAATACTGCTTCTCGAC' nosZ2r=

Python “regex” module: Fuzziness value

倾然丶 夕夏残阳落幕 提交于 2019-12-01 04:11:02
问题 I'm using the "fuzzy match" functionality of the Regex module. How can I get the "fuzziness value" of a "match" which indicates how different the pattern is to the string, just like the "edit distance" in Levenshtein? I thought I could get the value in the Match object, but it's not there. The official docs said nothing about it, neither. e.g.: regex.match('(?:foo){e}','for') a.captures() tells me that the word "for" is matched, but I'd like to know the fuzziness value, which should be 1 in