How to count words in java

后端未结

关注

 6  1708

心在旅途

I am looking for an algorithm, hint or any source code that can solve my following problem.

I have a folder it contains many text files. I read them and store all te

相关标签:

6条回答

感动是毒

2020-12-06 21:34
Like GregS said, use HashMap. I'm not posting any code, because I think this is a homework and I want to give to you the opportunity to create it on your own, but the outline is:
1. Open new document
2. For every word, look at your hashmap if it's already there. If it isn't, create a new key in HashMap with this word, and in that position add the new document (the filename). If it is, just add the filename of the document.
For example, if you have: DocA: Brown fox jump DocB: Fox jump dog

You would open DocA and traverse its contents. 'brown' is not in your hashmap, so you would add a new element with key 'brown' and value 'DocA'. The same with 'fox' and 'jump'. Then you would open DocB. 'fox' is already in your hashmap, so you would add to its value DocB, (the value would be 'DocA DocB'). Maybe using an ArrayList (in Java) would help.
0 讨论(0)
发布评论:

提交评论
- 加载中...

粉色の甜心

2020-12-06 21:40

This code will return all distinct words as a key and count as a value of each words found in a sentence. Just create a String object as a input from file or command prompt and pass it in below method.

public Map<String,Integer> getWordsWithCount(String sentances)
{
    Map<String,Integer> wordsWithCount = new HashMap<String, Integer>();

    String[] words = sentances.split(" ");
    for (String word : words)
    {
        if(wordsWithCount.containsKey(word))
        {
            wordsWithCount.put(word, wordsWithCount.get(word)+1);
        }
        else
        {
            wordsWithCount.put(word, 1);
        }

    }

    return wordsWithCount;

}

0 讨论(0)

野趣味

2020-12-06 21:41

It might be helpful to think about the problem in terms of 'I have this set of words for all documents together' and 'I could store somehow in which of the documents each of these words appear'. Given such a representation of your data it would be very easy to determine if a given word appears in multiple documents. On how to do this, others have provided hints here.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-12-06 21:42

HashMap mapping Strings to Integers. Integers are a immutable so there is a bit of hustle to "increment" but not too much. You can override the put() method do that.

0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-12-06 21:48
Just another idea different then all valuable answers, i admit hash looks better, i just wanted to see it in another angle.

i would sort all words in each document and compare each document with each other.

For example docA > brown, fox, jump; docB-> doc, jump, not docC-> dog, fox, jump

comparing them comes like this
```
 until there is a single document with words
   get first element of documents
   compare the most descending first element if that element exists more than once reserve it
   throw the one that is the most descending (in my case)
```
so in first comparison

docA -> fox, jump docB -> doc, jump, not docC -> dog, fox, jump

in second comparison

docA -> fox, jump docB -> jump, not docC -> dog, fox

in third comparison

docA -> fox, jump docB -> jump, not docC -> fox, jump

reserve fox in 4 th comparison, reserve jump in 5 th comparison.
0 讨论(0)
发布评论:

提交评论
- 加载中...
不知归路

2020-12-06 21:54

Hint: HashMap mapping Strings to Lists of files.

0 讨论(0)
发布评论:

提交评论
- 加载中...