How to get the first 10 words in a string in R?

后端 未结 4 1209
清歌不尽
清歌不尽 2020-12-17 00:00

I have a string in R as

x <- \"The length of the word is going to be of nice use to me\"

I want the first 10 words of the above specifi

4条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-17 00:03

    Regular expression (regex) answer using \w (word character) and its negation \W:

    gsub("^((\\w+\\W+){9}\\w+).*$","\\1",x)
    
    1. ^ Beginning of the token (zero-width)
    2. ((\\w+\\W+){9}\\w+) Ten words separated by not-words.
      1. (\\w+\\W+){9} A word followed by not-a-word, 9 times
        1. \\w+ One or more word characters (i.e. a word)
        2. \\W+ One or more non-word characters (i.e. a space)
        3. {9} Nine repetitions
      2. \\w+ The tenth word
    3. .* Anything else, including other following words
    4. $ End of the token (zero-width)
    5. \\1 when this token found, replace it with the first captured group (the 10 words)

提交回复
热议问题