Are there better ways to read an entire html file to a single string variable than:
String content = \"\";
try {
BufferedReader in = new Buff
There's the IOUtils.toString(..) utility from Apache Commons.
If you're using Guava
there's also Files.readLines(..) and Files.toString(..).
Here's a solution to retrieve the html of a webpage using only standard java libraries:
import java.io.*;
import java.net.*;
String urlToRead = "https://google.com";
URL url; // The URL to read
HttpURLConnection conn; // The actual connection to the web page
BufferedReader rd; // Used to read results from the web page
String line; // An individual line of the web page HTML
String result = ""; // A long string containing all the HTML
try {
url = new URL(urlToRead);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
result += line;
}
rd.close();
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(result);
SRC
You can use JSoup.
It's a very strong HTML parser
for java
I prefers using Guava :
import com.google.common.base.Charsets;
import com.google.common.io.Files;
File file = new File("/path/to/file", Charsets.UTF_8);
String content = Files.toString(file);
For string operations use StringBuilder or StringBuffer classes for accumulating string data blocks. Do not use +=
operations for string objects. String
class is immutable and you will produce a large amount of string objects upon runtime and it will affect on performance.
Use .append()
method of StringBuilder/StringBuffer class instance instead.
You should use a StringBuilder:
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
}
String content = contentBuilder.toString();