Extracting whole words

前端 未结 4 2111
借酒劲吻你
借酒劲吻你 2020-12-03 15:38

I have a large set of real-world text that I need to pull words out of to input into a spell checker. I\'d like to extract as many meaningful words as possible with

4条回答
  •  无人及你
    2020-12-03 16:27

    Are you familiar with word boundaries? (\b). You can extract word's using the \b around the sequence and matching the alphabet within:

    \b([a-zA-Z]+)\b
    

    For instance, this will grab whole words but stop at tokens such as hyphens, periods, semi-colons, etc.

    You can the \b sequence, and others, over at the python manual

    EDIT Also, if you're looking to about a number following or preceding the match, you can use a negative look-ahead/behind:

    (?!\d)   # negative look-ahead for numbers
    (?

提交回复
热议问题