Match words of permutations of another word using regex [duplicate]

喜你入骨 提交于 2019-12-12 06:55:57

问题


I have a chunk of words, all of which are valid English words I'm going to query with RegExp.

What I need is to match words which contains the letters of a specified word in any order.

Example (A segment):

...
peloton
pelt
pelta
peltae
peltast
....

I should be able to fill in a regex for "leap" and collect "pelta", "peltae" and "peltast" along with other words within the database. (Such as: "selfpreservatory")

What I have:

/^([chars]).*(?:\1|([chars])).*(?:\1|\2|([chars])).*{et cetera}.*(?:\1|\2|{et cetera}|\{n-1}|([chars]))(?{n})$/

(Fill in {et cetera} and {n}, {n-1} with respective to word length)

This is how it ('s supposed to) works:

You start with a pool of characters in your word, which hopefully does not have any repeating characters. (This group is [chars].) At first it matches the first character it sees that is in [chars]. Then when it looks for the next character in [chars], it either matches the first match, and captures nothing, or matches anything else in the pool, and captures that next character. Essentially, the second (?:) group removes the first match from the pool of characters. Once it captures n characters it checks to see if the nth character has actually matched. If it hasn't, then it doesn't match the word.

This iteration does not really work though. What is a correct attempt to this?

Note: I am not grepping, so I do need to use ^$. Instead of \b.

Thanks in advance!

Edit: I've tried this approach also. It's not working at all.

/^(([chars]).*(?!\1|\2)){n}$/

回答1:


Using lookaheads, with "leap" as an example:

\b(?=[a-z]*l)(?=[a-z]*e)(?=[a-z]*a)(?=[a-z]*p)[a-z]+\b

Fiddle: http://refiddle.com/12u4

EDIT: I added \b anchors (word boundaries); the leading one is especially important, otherwise "appeal" might be captured three times ("appeal", "ppeal", "peal"). Feel free to use other anchors when appropriate (e.g. ^...$).

By the way, this approach is also suitable to match the same character more than once. Say you want to match all words containing the letters "pop" (i.e. at least two "p", and at least one "o").

\b(?=[a-z]*p[a-z]*p)(?=[a-z]*o)[a-z]+\b

Or with a quantifier:

\b(?=([a-z]*p){2})(?=[a-z]*o)[a-z]+\b

Both will match "pop", "pope", "oppress", but not "poke".



来源:https://stackoverflow.com/questions/24183856/match-words-of-permutations-of-another-word-using-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!