Regex to search for 5 words before and after a given word

早过忘川 提交于 2021-01-01 06:39:55

问题


I need to write an AS3 program to search for a certain "keyword" in the rss of certain blogs.

I've written logic using String.indexOf() but this is EXTREMELY slow, and not scalable. I've been looking to write a regular expression which looks for the keyword, but also returns 5 words before and after the keyword (to show the context of the search result).

I suppose overlapping matches can be ignored.

I've come up with (?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}keyword(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}

The only problem with this is: It does not look for full words. So, using that regex on

the quick brown fox jumps over the lazy dog, and then goes for his afternoon nap in the kennel

for the keyword "the", would match

the quick brown fox jumps over
the lazy dog, and the
n goes for his afternoon nap in the kennel

Notice the "then" gets split.

How do I match the whole word only?

I tried adding whitespace and punctuation before and after the keyword, but that will cause a problem for keywords in the absolute start and end of the search text.

Any way around this?

Take the real-world text to be

These new wedges from SCOR offer varying degrees of loft, giving you more accuracy when you're close to the green. Photo by Ariel Zambelich/Wired For most golfers, well over half of their shots are played within 100 yards of the hole. That figure runs about 60 to 65 percent, depending on your game. While it’s sexy to crush the ball off the tee, for most of us, getting better with a putter or wedge in our hands would do much more for our scores than adding 15 yards to our drive. SCOR Golf recently released a system the company calls SCOR4161. The 4161 is for the available lofts of the clubs it makes, from 41 degrees of loft all the way up to 61 degrees, in one-degree increments. That’s a range that lets golfers replace their standard 9 iron and pitching wedge from their set of irons, as well as other wedges they might be using, with a set of clubs designed for precision from 130 yards and closer. SCOR’s claims here are that the shots we normally hit with a 9 iron and pitching wedge are much more like sand wedge shots than 5 iron shots; therefore, it makes sense to design those clubs to be more like a sand wedge. In my testing, I wasn’t as convinced of that notion. Usually, I’m hitting a 9 iron and PW more as a full shot. There’s not the same need for feel and finesse with those shots, and for me, I tended to be more consistent with the clubs from my iron set than with the SCOR clubs. But when I got to the three lofts that would be more traditionally thought of as wedges (49, 54, 59 degrees), I was very, very impressed with the SCOR clubs.Photo by Ariel Zambelich/Wired A couple of things stood out. Wedges traditionally include a certain amount of bounce on the club — the trailing edge is lower than the leading edge. This keeps the club from digging into the ground when you swing. Usually, different lies call for different bounce angles: higher bounce for sand or softer turf, lower bounce for firmer, tighter lies. The SCOR clubs include a bounce angle innovation (the company calls it V-Sole) that lets you use each of the clubs in different situations. The very leading edge of the club is ground with a very high bounce angle — sometimes 25 degrees or more — to keep the club from digging in. But the rest of the sole is ground at a much lower bounce, in the 5- to 9-degree range, which gives good performance on harder turf. The result for me was a versatile club that felt like it handled different conditions without much issue. The clubs aren’t quite as easy to open up as traditional sand wedges for big flop shots, but that was more than made up for by their consistency in different lies. Other thoughtful touches abound. The grips on the club come marked with places for your thumb spaced out every inch. The idea is that you can take a little distance off by choking down on the club, and begin to understand exactly how far you hit it based on your hand position. SCOR even provides an e-book outlining the method, and a bag tag that lets you jot down your results. It’s a great way to get better control over your distance, which is a key to good rounds. Overall, there’s not much to quibble with here. The clubheads are soft, which is good for feel, although they’ve gotten a little nicked up during my testing. The design of the clubhead is classic and confidence-inspiring, but the graphics on the shafts and the grips are a little distracting; something more understated would be nice. But by far the most important thing about SCOR is that it’s trying to build a system that helps your short game, from clubs to technique. Most golfers buy wedges by feel — they pick a motley group, trying out a few and seeing how they hit. If SCOR can get you to think about how all your clubs come together to influence your game, it will have done the whole golfing world a huge service. WIRED Beautiful build quality. Consistent feel across clubs, consistent performance across conditions. Sensible advice in owner’s manual to help improve short game. Small company leads to great customer service. TIRED Ever so slightly garish design. Clubs are $150 each, and five clubs will set you back $640.


回答1:


I am not aware of actionscript syntax but following regex should work for you:

'/((?:\w+\W*){5})\b' . $keyword . '\b((?:\W*\w+){5})/'

If you want I can provide a demo for you in PHP.

EDIT The original regexp (?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}keyword(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5} works properly. Just need to add \b before and after the keyword




回答2:


Maybe this one close to your expression will fit your need :

just get rid of chars that your are not interested after your keyword or match the end of string

(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}keyword(?:[^a-zA-Z'-]+|$)(?:[a-zA-Z'-]+[^a-zA-Z'-]*){0,5}




回答3:


So after a little less than a month of regular use, I find that

"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}\W)(" + keyword + ")(\W(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5})"

works best.

I substituted the word boundaries for a non-word character. This allows for punctuation immediately before / after the keyword.



来源:https://stackoverflow.com/questions/9756644/regex-to-search-for-5-words-before-and-after-a-given-word

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!