Creating a Java Program to Search a File for a Specific Word

人盡茶涼 提交于 2019-11-29 16:02:42

问题


I am just learning that language and was wondering what a more experience Java programmer would do in the following situation?

I would like to create a java program that will search a specified file for all instanced for a specific word.

How would you go about this, does that Java API come with a class that provides file scanning capabilities or would i have to write my own class to do this?

Thanks for any input,
Dom.


回答1:


The java API does offer the java.util.Scannerclass which will allow you to scan across an input file.

Depending on how you intend to use this, however, this might not be the best idea. Is the file very large? Are you searching only one file or are you trying to keep a database of many files and search for files within that? In that case, you might want to use a more fleshed out engine such as lucene.




回答2:


Unless the file is very large, I would

String text = IOUtils.toString(new FileReader(filename));
boolean foundWord = text.matches("\\b" + word+ "\\b");

To find all the text between your word you can use split() and use the length of the strings to determine the position.




回答3:


As others have pointed out, you could use the Scanner class.

I put your question in a file, data.txt, and ran the following program:

import java.io.*;
import java.util.Scanner;
import java.util.regex.MatchResult;

public class Test {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner s = new Scanner(new File("data.txt"));
        while (null != s.findWithinHorizon("(?i)\\bjava\\b", 0)) {
            MatchResult mr = s.match();
            System.out.printf("Word found: %s at index %d to %d.%n", mr.group(),
                    mr.start(), mr.end());
        }
        s.close();
    }
}

The output is:

Word found: Java at index 74 to 78.
Word found: java at index 153 to 157.
Word found: Java at index 279 to 283.

The pattern searched for, (?i)\bjava\b, means the following:

  • (?i) turn on the case-insensitive switch
  • \b means a word boundry
  • java is the string searched for
  • \b a word boundry again.

If the search term comes from the user, or if it for some other reason may contain special characters, I suggest you use \Q and \E around the string, as it quotes all characters in between, (and if you're really picky, make sure the input doesn't contain \E itself).



来源:https://stackoverflow.com/questions/4338450/creating-a-java-program-to-search-a-file-for-a-specific-word

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!