Python re match only letters from word

醉酒当歌 提交于 2019-12-02 08:36:46

In the spirit of the question, here's a regex answer.

Here's the regex to play with.

It's ^(?=[string]{1,6}$)(?!.*(.).*\1).*$

This checks for 1-6 occurrences of the characters within string. The second half ensures that there is no duplication. Of course, this approach breaks down if you had multiple identical characters in your original sstring, and it isn't particularly efficient for long strings.

The code to run it for generic input words:

import re
mylist = ["strings", "string", "str", "ing", "in", "ins", "rs", "stress"]
word = "string"
r = re.compile("^(?=[%s]{1,%d}$)(?!.*(.).*\1).*$" % (word, len(word)))
print filter(r.match, mylist)

This prints:

['string', 'str', 'ing', 'in', 'ins', 'rs']

You can play with the code here.

I don't think you can do this with regex, but I do think you can do it with collections:

>>> from collections import Counter
>>> target = "string"
>>> words = ["strings", "string", "str", "ing", "in", "ins", "rs", "stress"]
>>> [word for word in words if not Counter(word) - Counter(target)]
['string', 'str', 'ing', 'in', 'ins', 'rs']

Regular expressions may not be the the best solution. Here is one algorithm:

  • Make a dictionary of your target word with each letter being a key and the value(s) being the quantity of that letter in the word. e.g. for string, the key:value pair for s would be {'s':1}.
  • for each word you want to test check to see if every letter is in the dictionary AND that the letter counts do not exceed the counts in the target word.

I think you totally don't need to use Python re. If I understood you well, you want get only such words where the letters cannot repeat.

This problem can be solved with the two following lines of the Python code.

str_list = [u'strings', u'string', u'str', u'ing', u'in', u'ins', u'rs', u'stress']
new_list = [i for i in str_list if len(set(i)) == len(i) ]
print new_list

The output of the program is:

[u'string', u'str', u'ing', u'in', u'ins', u'rs']

For the unicode string you must use unicode string class or codepages. You cannot use utf-8 representation. Function set create unique set from iterable object. The iterable object is string too. Letters which repeat are removed. If you remove something the length cannot be same as original string.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!