How does the removeSparseTerms in R work?

前端 未结 3 2023
我在风中等你
我在风中等你 2020-12-12 22:50

I am using the removeSparseTerms method in R and it required a threshold value to be input. I also read that the higher the value, the more will be the number of terms retai

3条回答
  •  一向
    一向 (楼主)
    2020-12-12 23:22

    In the function removeSparseTerms(), the argument sparse = x means:
    "remove all terms whose sparsity is greater than the threshold (x)".
    e.g: removeSparseTerms(my_dtm, sparse = 0.90) means remove all terms in the corpus whose sparsity is greater than 90%.

    For example, a term that appears say just 4 times in a corpus of say size 1000, will have a frequency of appearance of 0.004 =4/1000.

    This term's sparsity will be (1000-4)/1000 = 1- 0.004 = 0.996 = 99.6%.
    Therefore if sparsity threshold is set to sparse = 0.90, this term will be removed as its sparsity (0.996) is greater than the upper bound sparsity (0.90).
    However, if sparsity threshold is set to sparse = 0.999, this term will not be removed as its sparsity (0.996) is lower than the upper bound sparsity (0.999).

提交回复
热议问题