Inverse Document Frequency Formula
问题 I'm having trouble with manually calculating the values for tf-idf. Python scikit keeps spitting out different values than I'd expect. I keep reading that idf(term) = log(# of docs/ # of docs with term) If so, won't you get a divide by zero error if there are no docs with the term? To solve that problem, I read that you do log (# of docs / # of docs with term + 1 ) But then if the term is in every document, you get log (n/n+1) which is negative, which doesn't really make sense to me. What am