I am looking for an algorithm, hint or any source code that can solve my following problem.
I have a folder it contains many text files. I read them and store all te
Like GregS said, use HashMap. I'm not posting any code, because I think this is a homework and I want to give to you the opportunity to create it on your own, but the outline is:
For example, if you have: DocA: Brown fox jump DocB: Fox jump dog
You would open DocA and traverse its contents. 'brown' is not in your hashmap, so you would add a new element with key 'brown' and value 'DocA'. The same with 'fox' and 'jump'. Then you would open DocB. 'fox' is already in your hashmap, so you would add to its value DocB, (the value would be 'DocA DocB'). Maybe using an ArrayList (in Java) would help.
This code will return all distinct words as a key and count as a value of each words found in a sentence. Just create a String object as a input from file or command prompt and pass it in below method.
public Map<String,Integer> getWordsWithCount(String sentances)
{
Map<String,Integer> wordsWithCount = new HashMap<String, Integer>();
String[] words = sentances.split(" ");
for (String word : words)
{
if(wordsWithCount.containsKey(word))
{
wordsWithCount.put(word, wordsWithCount.get(word)+1);
}
else
{
wordsWithCount.put(word, 1);
}
}
return wordsWithCount;
}
It might be helpful to think about the problem in terms of 'I have this set of words for all documents together' and 'I could store somehow in which of the documents each of these words appear'. Given such a representation of your data it would be very easy to determine if a given word appears in multiple documents. On how to do this, others have provided hints here.
HashMap mapping Strings to Integers. Integers are a immutable so there is a bit of hustle to "increment" but not too much. You can override the put() method do that.
Just another idea different then all valuable answers, i admit hash looks better, i just wanted to see it in another angle.
i would sort all words in each document and compare each document with each other.
For example docA > brown, fox, jump; docB-> doc, jump, not docC-> dog, fox, jump
comparing them comes like this
until there is a single document with words get first element of documents compare the most descending first element if that element exists more than once reserve it throw the one that is the most descending (in my case)
so in first comparison
docA -> fox, jump docB -> doc, jump, not docC -> dog, fox, jump
in second comparison
docA -> fox, jump docB -> jump, not docC -> dog, fox
in third comparison
docA -> fox, jump docB -> jump, not docC -> fox, jump
reserve fox in 4 th comparison, reserve jump in 5 th comparison.
Hint: HashMap mapping Strings to Lists of files.