Regular expression not extracting the exact pattern

一个人想着一个人 提交于 2020-01-16 11:26:30

问题


I am working in Java to read a string of over 100000 characters. I have a list of keywords, that I search the string for, and if the string is present I call a function which does some internal processing.

The kind of keyword I have is "face", for example - I wish to get all the patterns where I have matches for "faces" not "facebook". I can accept a space character behind the face in the string so if in a string I have a match like " face" or " faces" or "face " or " faces" i can accept that too. However I can not accept "duckface" or "duckface " etc.

I have written the regex

Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");

where keyword is my list of keywords, but I am not getting the desired results. Can you read my description and please suggest what might be issue and how I can fix it?

Also if a pointer to a really good regex for Java page is shared I would appreciate that as well.

Thank you Contributers ..

Edit

The reason I know it is not working is I have used the following code:

Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");
            Matcher m = p.matcher(myInputDataSting);
            if(m.find())
            {
                System.out.println("Its a Match: "+m.group());
}

This returns a blank string...


回答1:


If keyword is "face", then your current regex is

\s+faces\s+|\s+

which matches either one or more whitespace characters, followed by faces, followed by one or more whitespace characters, or one or more whitespace characters. (The pipe | has very low precedence.)

What you really want is

\bfaces?\b

which matches a word boundary, followed by face, optionally followed by s, followed by a word boundary.

So, you can write:

Pattern p = Pattern.compile("\\b"+keyword+"s?\\b");

(though obviously this will only work for words like face that form their plurals by simply adding s).

You can find a comprehensive listing of Java's regular-expression support at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html, but it's not much of a tutorial. For that, I'd recommend just Googling "regular expression tutorial", and finding one that suits you. (It doesn't have to be Java-specific: most of the tutorials you'll find are for flavors of regular-expression that are very similar to Java's.)




回答2:


You should use

Pattern p = Pattern.compile("\b"+keyword+"s?\b");

, where keyword is not plural. \\b means that keyword must be as a complete word in searched string. s? means that keyword's value may end with s.

If you are not familar enough with regular expressions I recommend reading http://docs.oracle.com/javase/tutorial/essential/regex/index.html, because there are examples and explanations.



来源:https://stackoverflow.com/questions/9342985/regular-expression-not-extracting-the-exact-pattern

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!