jsoup

Cannot fetch image Url (defined with data-original) inside specic class ( JSOUP)

半城伤御伤魂 提交于 2020-01-05 12:11:47
问题 HTML source (Note that it uses lazy load jQuery plugin): 1). When I run code below it fetches all image Urls from website: Elements images=document.select("img[src~=(?i)\\.(png|jpe?g|gif)]"); 2). But when I specify the class it fails, like below: Elements images=document.select("div.newscat img[src~=(?i)\\.(png|jpe?g|gif)]"); And then I employ ( in second case it throws OutOfBoundsException): for (int i=0;i<images.size();i++){ imageUrl[i]=images.get(i).attr("src"); } Could, anyhow, lazy load

how to post data to an ajax function with Jsoup

假装没事ソ 提交于 2020-01-05 10:28:24
问题 i want to post a string to <li id="coz"><a onclick="doRequest('zemberek.jsp','YAZI_COZUMLE');">Cozumle</a></li> by Jsoup?.How can I do? here is original site : http://zemberek-web.appspot.com/ <html> <head> <script> function doRequest(url, islem) { var ajaxRequest = new AjaxRequest(url); var hiddenField = document.getElementById("islem"); hiddenField.value = islem; ajaxRequest.addNamedFormElements("giris", "islem"); ajaxRequest.sendRequest(); } </script> </head> <body> <big>Zemberek Demo</big

Connection with JSoup via proxy

百般思念 提交于 2020-01-05 09:33:58
问题 System.setProperty("http.proxyHost", "<proxyip>"); // set proxy server System.setProperty("http.proxyPort", "<proxyport>"); //set proxy port Document doc = Jsoup.connect("http://your.url.here").get(); // Jsoup now connects via proxy I have a script that will log in to a website by proxy. I tried to check if it works by adding a fake proxy to a specific user, and the problem is that it will login even if proxy is fake, so it should not login or post. I use the code above for calling proxy 回答1:

Java how to find out if a URL is http or https?

ぃ、小莉子 提交于 2020-01-05 09:33:33
问题 I am writing a web crawler tool in Java. When I type the website name, how can I make it so that it connects to that site in http or https without me defining the protocol? try { Jsoup.connect("google.com").get(); } catch (IOException ex) { Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex); } But I get the error: java.lang.IllegalArgumentException: Malformed URL: google.com What can I do? Are there any classes or libraries that do this? What I'm trying to do is I have a

Java how to find out if a URL is http or https?

百般思念 提交于 2020-01-05 09:30:49
问题 I am writing a web crawler tool in Java. When I type the website name, how can I make it so that it connects to that site in http or https without me defining the protocol? try { Jsoup.connect("google.com").get(); } catch (IOException ex) { Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex); } But I get the error: java.lang.IllegalArgumentException: Malformed URL: google.com What can I do? Are there any classes or libraries that do this? What I'm trying to do is I have a

jSoup get title from img tag

核能气质少年 提交于 2020-01-05 09:03:08
问题 I have a scenario where I need to pull the title from a img tag like below. <img alt="Bear" border="0" src="/images/teddy/5433.gif" title="Bear"/> I was able to get the image url. But how do i get the title from the img tag. From above title = "bear". I want to extract this. 回答1: Use Element#attr() to extract arbitrary element attributes. Element img = selectItSomehow(); String title = img.attr("title"); // ... See also: Jsoup Cookbook - Extract attributes, text, and HTML from elements 回答2:

jSoup get title from img tag

余生长醉 提交于 2020-01-05 09:02:24
问题 I have a scenario where I need to pull the title from a img tag like below. <img alt="Bear" border="0" src="/images/teddy/5433.gif" title="Bear"/> I was able to get the image url. But how do i get the title from the img tag. From above title = "bear". I want to extract this. 回答1: Use Element#attr() to extract arbitrary element attributes. Element img = selectItSomehow(); String title = img.attr("title"); // ... See also: Jsoup Cookbook - Extract attributes, text, and HTML from elements 回答2:

Java Jsoup downloading torrent file

♀尐吖头ヾ 提交于 2020-01-05 08:42:25
问题 I got a problem, I want to connect to this website (https://ww2.yggtorrent.is) to download torrent file. I've made a method to connect to the website by Jsoup who work well but when I try to use it to Download the torrent file, the website return "You must be connected to download file". Here is my code to connect: Response res = Jsoup.connect("https://ww2.yggtorrent.is/user/login") .data("id", "<MyLogin>", "pass", "<MyPassword>") .method(Method.POST) .execute(); and here is my code to

Jsoup .select returns empty value but element does contains text

拥有回忆 提交于 2020-01-05 07:44:08
问题 I'm trying to get the text of "link" tag element in this xml: http://www.istana.gov.sg/latestupdate/rss.xml I have coded to get the first article. URL = getResources().getString(R.string.istana_home_page_rss_xml); // URL = "http://www.istana.gov.sg/latestupdate/rss.xml"; try { doc = Jsoup.connect(URL).ignoreContentType(true).get(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } // retrieve the link of the article links = doc.select("link"); // retrieve the

How to remove only html tags from text with Jsoup?

我们两清 提交于 2020-01-04 11:00:36
问题 I want to remove ONLY html tags from text with JSOUP. I used solution from here (my previous question about JSOUP) But after some checkings I discovered that JSOUP gets JAVA heap exception: OutOfMemoryError for big htmls but not for all. For example, it fails on html 2Mb and 10000 lines. Code throws an exception in the last line (NOT on Jsoup.parse): public String StripHtml(String html){ html = html.replace("<", "<").replace(">", ">"); String[] tags = getAllStandardHtmlTags; Document thing =