I want to get text from html file in java
My html
file is:
vishal
patel
&l
Better to use html Parser....I prefer to use JSoup parser(opensource package)....
import org.jsoup.Jsoup;
public class HTMLUtils {
public static String extractText(Reader reader) throws IOException {
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(reader);
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
String textOnly = Jsoup.parse(sb.toString()).text();
return textOnly;
}
public final static void main(String[] args) throws Exception {
FileReader reader = new FileReader("C:/RealHowTo/topics/java-language.html");
System.out.println(HTMLUtils.extractText(reader));
}
}
I have used a library called JSoup.
It's very simple to retrieve the text-only part from a HTML file.
It's very simple:
Jsoup.parse(html).text();
gives you the text from the HTML file