Python re match only letters from word

时间秒杀一切 提交于 2019-12-02 21:44:18

问题


I am new to Python re, but I need help. I searched here, google, documentation, but nothing worked. So here is what I am trying to do.

I have word (for example) "string" then I have word list:

strings, string, str, ing, in, ins, rs, stress

And I want to matches like: string, str, ing, in, ins, rs.

I don't want to match: stress, strings (because there are 2x s, and in word string, there is only 1)

  • Simply match only the letters which are in word string.

Sorry for bad english and if I didnt explained good enough.

YES, and also, some letters are unicode.


回答1:


In the spirit of the question, here's a regex answer.

Here's the regex to play with.

It's ^(?=[string]{1,6}$)(?!.*(.).*\1).*$

This checks for 1-6 occurrences of the characters within string. The second half ensures that there is no duplication. Of course, this approach breaks down if you had multiple identical characters in your original sstring, and it isn't particularly efficient for long strings.

The code to run it for generic input words:

import re
mylist = ["strings", "string", "str", "ing", "in", "ins", "rs", "stress"]
word = "string"
r = re.compile("^(?=[%s]{1,%d}$)(?!.*(.).*\1).*$" % (word, len(word)))
print filter(r.match, mylist)

This prints:

['string', 'str', 'ing', 'in', 'ins', 'rs']

You can play with the code here.




回答2:


I don't think you can do this with regex, but I do think you can do it with collections:

>>> from collections import Counter
>>> target = "string"
>>> words = ["strings", "string", "str", "ing", "in", "ins", "rs", "stress"]
>>> [word for word in words if not Counter(word) - Counter(target)]
['string', 'str', 'ing', 'in', 'ins', 'rs']



回答3:


Regular expressions may not be the the best solution. Here is one algorithm:

  • Make a dictionary of your target word with each letter being a key and the value(s) being the quantity of that letter in the word. e.g. for string, the key:value pair for s would be {'s':1}.
  • for each word you want to test check to see if every letter is in the dictionary AND that the letter counts do not exceed the counts in the target word.



回答4:


I think you totally don't need to use Python re. If I understood you well, you want get only such words where the letters cannot repeat.

This problem can be solved with the two following lines of the Python code.

str_list = [u'strings', u'string', u'str', u'ing', u'in', u'ins', u'rs', u'stress']
new_list = [i for i in str_list if len(set(i)) == len(i) ]
print new_list

The output of the program is:

[u'string', u'str', u'ing', u'in', u'ins', u'rs']

For the unicode string you must use unicode string class or codepages. You cannot use utf-8 representation. Function set create unique set from iterable object. The iterable object is string too. Letters which repeat are removed. If you remove something the length cannot be same as original string.



来源:https://stackoverflow.com/questions/44781243/python-re-match-only-letters-from-word

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!