Stemming with R Text Analysis

前端 未结 3 2006
花落未央
花落未央 2020-12-08 08:34

I am doing a lot of analysis with the TM package. One of my biggest problems are related to stemming and stemming-like transformations.

Let\'s say I hav

3条回答
  •  眼角桃花
    2020-12-08 09:06

    This question inspired me to attempt to write a spell check for the qdap package. There's an interactive version that may be useful here. It's available in qdap >= version 2.1.1. That means you'll need the dev version at the moment.. here are the steps to install:

    library(devtools)
    install_github("qdapDictionaries", "trinker")
    install_github("qdap", "trinker")
    library(tm); library(qdap)
    

    ## Recreate a Corpus like you describe.

    terms <- c("accounts", "account", "accounting", "acounting", "acount", "acounts", "accounnt")
    
    fake_text <- unlist(lapply(terms, function(x) {
        paste(sample(c(x, sample(DICTIONARY[[1]], sample(1:5, 1)))), collapse=" ")
    }))
    
    fake_text
    
    inspect(myCorp <- Corpus(VectorSource(fake_text)))
    

    ## The interactive spell checker (check_spelling_interactive)

    m <- check_spelling_interactive(as.data.frame(myCorp)[[2]])
    preprocessed(m)
    inspect(myCorp <- tm_map(myCorp, correct(m)))
    

    The correct function merely grabs a closure function from the output of check_spelling_interactive and allows you to then apply the "correcting" to any new text string(s).

提交回复
热议问题