Extracting whole words

前端未结

关注

 4  2111

借酒劲吻你 2020-12-03 15:38

I have a large set of real-world text that I need to pull words out of to input into a spell checker. I\'d like to extract as many meaningful words as possible with

4条回答

无人及你 (楼主)

2020-12-03 16:27
Are you familiar with word boundaries? (\b). You can extract word's using the \b around the sequence and matching the alphabet within:
```
\b([a-zA-Z]+)\b
```
For instance, this will grab whole words but stop at tokens such as hyphens, periods, semi-colons, etc.

You can the \b sequence, and others, over at the python manual

EDIT Also, if you're looking to about a number following or preceding the match, you can use a negative look-ahead/behind:
```
(?!\d)   # negative look-ahead for numbers
(?
```
0 讨论(0) 查看其它4个回答发布评论: 提交评论加载中...