matching whole words while ignoring affixes of words using regex

久未见 提交于 2021-02-05 11:12:33

问题


I am learning a new language and I have created a DB with aprox. 2500 words and 2500 examples of the words. I created a PHP/MySQL web UI with basically shows pictures for each word and when you click them it will play the audio of the word. There is also a context menu to trigger a pop up div that matches and displays all examples where the word occurs.

I have been using REGEXP '[[:<:]]$word[[:>:]]' but there are several prefixes/suffixes of words that I want to filter out that do not add any real meaning to the word (like the suffix -ing in English). One way I have gotten around this is putting a hyphen in the word where the affix starts so the regex still matches the word but this isn't completely true to how the language handles the spelling. There are also different combinations of words that I do not want to filter because the meaning is completely different. Without getting into specifics here are some pseudo examples with the matched word as just "WORD" and prefixes and suffixes that I want to filter as pre1, pre2... and suf1, suf2... and the stuff I do not want to filter as xxx

1. Xxx xxx WORDsuf1 xxx xxx xxx.
2. Xxx xxx WORDsuf2 xxx xxx xxx.
3. Xxx xxx pre1WORDsuf1 xxx xxx xxx.
4. Xxx xxx WORD xxx xxx xxx.
5. Xxx xxx pre1WORD xxx xxx xxx.
6. Xxx xxx pre2WORDxxx xxx xxx xxx.
7. Xxx xxx xxxWORDxxx xxx xxx xxx.
8. Xxx xxx pre1WORDxxxsuf1 xxx xxx xxx.
9. Xxx xxx pre1xxxWORDsuf1 xxx xxx xxx.
10. Xxx xxx xxxWORDxxx xxx xxx xxx.

in the examples above I want to match 1, 2, 3, 4, 5 but I do not want to match 6, 7, 8, 9, 10. I started to just add OR clauses for example:

REGEXP  '[[:<:]$word[[:>:]]|[[:<:]]$word$suffix[[:>:]]'

This works fine for one exception but with multiple exceptions it gets messy.

Admittedly I'm pretty inexperienced with regex and most of what I manage to work out are simple examples that I have to read up on. Can this be done with a short and efficient regex?


回答1:


Is this what are you looking for?

(\b(pre1|pre2)?WORD(suf1|suf2)?\b)

Online demo

If you are looking for whole line as a match then try below regex and get if from matched group at index 1

(.*(\b(pre1|pre2)?WORD(suf1|suf2)?\b).*)

Online demo

Use preg_match_all to get all the matched groups.



来源:https://stackoverflow.com/questions/24719591/matching-whole-words-while-ignoring-affixes-of-words-using-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!