Java: Most efficient way to check if a String is in a wordlist

限于喜欢 提交于 2019-12-10 16:44:32

问题


I have an array of strings String[] words and a 28000 word Word-list.

I want to check if any member of the String array is in the WordList (the word-list is in a text file wordlist.txt)

What is the most efficient way to go about this?


回答1:


Place the strings directly into a HashSet<String> rather than an array and iterate through the file using contains on the set to check the content. You wont improve on O(1) access. This will also mimimize memory used to store the Strings should any duplicates exist.




回答2:


You can try the array (tree) suffix algorithm, but you need to implement, look this:

Longest palindrome in a string using suffix tree




回答3:


Step1:Don't use string array. Instead of use HashSet.

Step2: Load the file(that is wordlist.txt) contents into another HashSet

Step3:

Set<String> set1 = new HashSet<String>(); //Load the string array into set
    Set<String> set2 = new HashSet<String>(); //load the file contents into set
    for (String str : set1) {
        for (String str2 : set2) {
            if (str.equalsIgnoreCase(str2)) {
                break;
            }
        }
    }



回答4:


You can use HashSet<String> or ArrayList<String> which has contains method. It will check if your String is stored or not.
The difference between HashSet and ArrayList is hashset won't allow duplicate value and it will not maintain the order while arraylist allows you duplicates and its an ordered collection. But HashSet is more efficient than arraylist to perform a search operations.




回答5:


Create a HashSet of Strings as

HashSet<String> wordSet = new HashSet<String>(Arrays.asList(words));

And Check for word in HashSet with HashSet.contains(Object o) method where word is the word you want to check if exists.




回答6:


Store instead of the original words.txt a serialized HashSet. As a separate step from running the application.

The application then only needs to load the hash set once.




回答7:


HashSet's add() returns false if the word is already present in the set.

for (String str : words) {
  if (!wordSet.add(str)) {
    System.out.println("The word " + str + " is already contained.");
  }
}

This is a bit more sophisticated and less low-level than contains().




回答8:


A HashSet will suffice if your list of words can fit in memory.

If memory size is a concern use a BloomFilter. While it is possible for bloom filter to give the wrong answer, you can tune the probability with which it happens.



来源:https://stackoverflow.com/questions/18658315/java-most-efficient-way-to-check-if-a-string-is-in-a-wordlist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!