Retrieve text from html file in java

后端 未结 2 733
深忆病人
深忆病人 2020-12-16 03:37

I want to get text from html file in java

My html file is:



vishal

patel &l
相关标签:
2条回答
  • 2020-12-16 04:09

    Better to use html Parser....I prefer to use JSoup parser(opensource package)....

    import org.jsoup.Jsoup;
    public class HTMLUtils {
    
        public static String extractText(Reader reader) throws IOException {
            StringBuilder sb = new StringBuilder();
            BufferedReader br = new BufferedReader(reader);
            String line;
            while ((line = br.readLine()) != null) {
                sb.append(line);
            }
            String textOnly = Jsoup.parse(sb.toString()).text();
            return textOnly;
        }
    
        public final static void main(String[] args) throws Exception {
            FileReader reader = new FileReader("C:/RealHowTo/topics/java-language.html");
            System.out.println(HTMLUtils.extractText(reader));
        }
    }
    
    0 讨论(0)
  • 2020-12-16 04:13

    I have used a library called JSoup.
    It's very simple to retrieve the text-only part from a HTML file.
    It's very simple:

    Jsoup.parse(html).text();
    

    gives you the text from the HTML file

    0 讨论(0)
提交回复
热议问题