Cosine similarity vs Hamming distance [closed]
To compute the similarity between two documents, I create a feature vector containing the term frequencies. But then, for the next step, I can't decide between " Cosine similarity " and " Hamming distance ". My question: Do you have experience with these algorithms? Which one gives you better results? In addition to that: Could you tell me how to code the Cosine similarity in PHP? For Hamming distance, I've already got the code: function check ($terms1, $terms2) { $counts1 = array_count_values($terms1); $totalScore = 0; foreach ($terms2 as $term) { if (isset($counts1[$term])) $totalScore +=