I\'m trying to use JSoup to scrape some pages that are on a staging server. To view the pages on the staging server with a browser I need to be connected to a VPN.
I
You can set java properties for the proxy:
// if you use https, set it here too
System.setProperty("http.proxyHost", ""); // set proxy server
System.setProperty("http.proxyPort", ""); // set proxy port
Document doc = Jsoup.connect("http://your.url.here").get(); // Jsoup now connects via proxy
or download the website into a string and parse it then:
final URL website = new URL("http://your.url.here"); // The website you want to connect
// -- Setup connection through proxy
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("", 1234)); // set proxy server and port
HttpURLConnection httpUrlConnetion = (HttpURLConnection) website.openConnection(proxy);
httpUrlConnetion.connect();
// -- Download the website into a buffer
BufferedReader br = new BufferedReader(new InputStreamReader(httpUrlConnetion.getInputStream()));
StringBuilder buffer = new StringBuilder();
String str;
while( (str = br.readLine()) != null )
{
buffer.append(str);
}
// -- Parse the buffer with Jsoup
Document doc = Jsoup.parse(buffer.toString());
You can use HttpClient for this solution as well.