How does the removeSparseTerms in R work?

前端未结

关注

 3  2023

我在风中等你 2020-12-12 22:50

I am using the removeSparseTerms method in R and it required a threshold value to be input. I also read that the higher the value, the more will be the number of terms retai

3条回答

一向 (楼主)

2020-12-12 23:22

In the function removeSparseTerms(), the argument sparse = x means:
"remove all terms whose sparsity is greater than the threshold (x)".
e.g: removeSparseTerms(my_dtm, sparse = 0.90) means remove all terms in the corpus whose sparsity is greater than 90%.

For example, a term that appears say just 4 times in a corpus of say size 1000, will have a frequency of appearance of 0.004 =4/1000.

This term's sparsity will be (1000-4)/1000 = 1- 0.004 = 0.996 = 99.6%.
Therefore if sparsity threshold is set to sparse = 0.90, this term will be removed as its sparsity (0.996) is greater than the upper bound sparsity (0.90).
However, if sparsity threshold is set to sparse = 0.999, this term will not be removed as its sparsity (0.996) is lower than the upper bound sparsity (0.999).

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...