Inconsistent behaviour with tm_map transformation functions when using multiple cores
Another potential title for this post could be "When parallel processing in r, does the ratio between number of cores, loop chunk size and object size matter?" I have a corpus I am running some transformations on using tm package. Since the corpus is large I'm using parallel processing with doparallel package. Sometimes the transformations do the task, but sometimes they do not. For example, tm::removeNumbers() . The very first document in the corpus has a content value of "n417". So if preprocessing is successful then this doc will be transformed to just "n". Sample corpus is below for