Why isn't stemDocument stemming?
I am using the 'tm' package in R to create a term document matrix using stemmed terms. The process is completing, but the resulting matrix includes terms that don't appear to have been stemmed, and I'm trying to understand why that is and how to fix it. Here is the script for the process, which uses a couple of online news stories as the sandbox: library(boilerpipeR) library(RCurl) library(tm) # Pull the relevant parts of the news stories using 'boilerpipeR' and 'RCurl' url <- "http://blogs.wsj.com/digits/2015/07/14/google-mozilla-disable-flash-over-security-concerns/" extract <-