Breaking a paragraph into a vector of sentences in R

十年热恋 提交于 2021-01-28 23:19:21

问题


I have the following paragraph:

Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)

for the purpose of applying the calculate_total_presence_sentiment command from theRSentiment package I would like to break this paragraph into a vector of sentences as follows:

[1] "Well, um...such a personal topic."                                       
[2] "No wonder I am the first to write a review."                             
[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
[4] "And I had, well, major problems in this area and now I don't."           
[5] "'Nuff said."                                                             
[6] ":-)"

Would appreciate your help on this.


回答1:


qdap has a convenient function for this:

sent_detect_nlp - Detect and split sentences on endmark boundaries using openNLP & NLP utilities which matches the onld version of the openNLP package's now removed sentDetect function.

library(qdap)

txt <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"

sent_detect_nlp(txt)
#[1] "Well, um...such a personal topic."                                       
#[2] "No wonder I am the first to write a review."                             
#[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
#[4] "And I had, well, major problems in this area and now I don't."           
#[5] "'Nuff said."                                                             
#[6] ":-)"



回答2:


Dirty Solution

    > data <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"
    > ?"regular expression"
    > strsplit(data, "(?<=[^.][.][^.])", perl=TRUE)
    [[1]]
   [1] "Well, um...such a personal topic. "                                       
   [2] "No wonder I am the first to write a review. "                             
   [3] "Suffice to say this stuff does just what they claim and tastes pleasant. "
   [4] "And I had, well, major problems in this area and now I don't. "           
   [5] "'Nuff said. "                                                             
   [6] ":-)"                                                                      

Use tools from https://cran.r-project.org/web/views/NaturalLanguageProcessing.html




回答3:


You can save your text in a .txt file. Make sure that each line in the .txt file contains one statement that would like to be read as a vector. Use the base function readLines('filepath/filename.txt'). The resulting data frame will read each line In the original text file as a vector.

> mylines <- readLines('text.txt')
Warning message:
In readLines("text.txt") : incomplete final line found on 'text.txt'
> mylines
[1] "Well, um...such a personal topic."                                       
[2] "No wonder I am the first to write a review."                             
[3] "Suffice to say this stuff does just what they claim and tastes
pleasant."
[4] "And I had, well, major problems in this area and now I don't."           
[5] "'Nuff said'."                                                            
[6] ":-)"

> mylines[3]
[1] "Suffice to say this stuff does just what they claim and tastes
pleasant."      


来源:https://stackoverflow.com/questions/40479496/breaking-a-paragraph-into-a-vector-of-sentences-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!