发表新帖

发表新帖

Stemming with R Text Analysis

前端未结

关注

 3  2006

花落未央 2020-12-08 08:34

I am doing a lot of analysis with the TM package. One of my biggest problems are related to stemming and stemming-like transformations.

Let\'s say I hav

3条回答

眼角桃花 (楼主)

2020-12-08 09:06
This question inspired me to attempt to write a spell check for the qdap package. There's an interactive version that may be useful here. It's available in qdap >= version 2.1.1. That means you'll need the dev version at the moment.. here are the steps to install:
```
library(devtools)
install_github("qdapDictionaries", "trinker")
install_github("qdap", "trinker")
library(tm); library(qdap)
```
## Recreate a Corpus like you describe.
```
terms <- c("accounts", "account", "accounting", "acounting", "acount", "acounts", "accounnt")

fake_text <- unlist(lapply(terms, function(x) {
    paste(sample(c(x, sample(DICTIONARY[[1]], sample(1:5, 1)))), collapse=" ")
}))

fake_text

inspect(myCorp <- Corpus(VectorSource(fake_text)))
```
## The interactive spell checker (check_spelling_interactive)
```
m <- check_spelling_interactive(as.data.frame(myCorp)[[2]])
preprocessed(m)
inspect(myCorp <- tm_map(myCorp, correct(m)))
```
The correct function merely grabs a closure function from the output of check_spelling_interactive and allows you to then apply the "correcting" to any new text string(s).
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题