jsoup

How to create an Jsoup Selector with an AND operation?

久未见 提交于 2020-01-02 05:50:04
问题 I want to find the following tag in a html. <a href="http://www.google.com/AAA" class="link">AAA</a> I know I can use a selector like a[href^=http://www.google.com/] or a[class=link] . But how can I combine this two conditions? Or is there a better way to do this? Like regex? and how? Thanks! 回答1: Just combine them in a single CSS selector. Elements links = document.select("a[href^=http://www.google.com/][class=link]"); // ... or Elements links = document.select("a.link[href^=http://www

how to export html table data displayed in web page to .csv format

老子叫甜甜 提交于 2020-01-01 22:08:10
问题 How to export the table (table id="cross_rate_markets_stocks_1") at https://in.investing.com/equities/india after selecting any option from the drop down menu the table that is coming needed to be saved it in .csv format. its for my final year project .. i have tried using 3rd party websites but it is capturing all the data of the site but i only need data of (table id="cross_rate_markets_stocks_1") firstly one need to select any value from the drop down with default value BSE Sensex 30 then

Using JSoup to save the contents of this url: http://www.aw20.co.uk/images/logo.png to a file

二次信任 提交于 2020-01-01 18:25:51
问题 I am try to use JSoup to get the contents of this url http://www.aw20.co.uk/images/logo.png, which is the image logo.png, and save it to a file. So far I have used JSoup to connect to http://www.aw20.co.uk and get a Document. I then went and found the absolute url for the image I am looking for, but now am not sure how to this to get the actual image. So I was hoping someone could point me in the right direction to do so? Also is there anyway I could use Jsoup.connect("http://www.aw20.co.uk

How to encode properly this URL

自闭症网瘾萝莉.ら 提交于 2020-01-01 09:44:51
问题 I am trying to get this URL using JSoup http://betatruebaonline.com/img/parte/330/CIGUEÑAL.JPG Even using encoding, I got an exception. I don´t understand why the encoding is wrong. It returns http://betatruebaonline.com/img/parte/330/CIGUEN%C3%91AL.JPG instead the correct http://betatruebaonline.com/img/parte/330/CIGUEN%CC%83AL.JPG How I can fix this ? Thanks. private static void GetUrl() { try { String url = "http://betatruebaonline.com/img/parte/330/"; String encoded = URLEncoder.encode(

Jsoup - Howto clean html by escaping not deleting the unwanted html?

时光毁灭记忆、已成空白 提交于 2020-01-01 04:54:07
问题 Is there a way of getting jsoup to clean a string with HTML in it by escaping the unwanted HTML rather than removing it completely? My example: String dirty = "This is <b>REALLY</b> dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a> String clean = Jsoup.clean(dirty, new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target")); This gives a "clean" string of: This is REALLY dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a> What I am wanting is the

Why Jsoup cannot select td element?

守給你的承諾、 提交于 2019-12-31 04:47:27
问题 I have made little test (with Jsoup 1.6.1): String s = "" +Jsoup.parse("<td></td>").select("td").size(); System.out.println("Selected elements count : " + s); It outputs: Selected elements count : 0 But it should return 1, because I have parsed html with td element. What is wrong with my code or is there bug in Jsoup? 回答1: Because Jsoup is a HTML5 compliant parser and you feeded it with invalid HTML. A <td> has to go inside at least a <table> . int size = Jsoup.parse("<table><td></td></table>

OR operator for JSOUP select() method

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-31 04:19:08
问题 I am trying to find all <div class="name1"> or <div class="name2"> tags in one page/document. How can I use OR operator in doc.select("div.name1 OR div.name2") ? 回答1: The select method of JSoup implements more or less the CSS selector syntax. So you can very simply use the CSS way of specifying alternatives, i.e. using , . This should work: doc.select("div.name1,div.name2"); 来源: https://stackoverflow.com/questions/24485013/or-operator-for-jsoup-select-method

Jsoup getting background image path from css

僤鯓⒐⒋嵵緔 提交于 2019-12-31 00:44:12
问题 I am looking for all of the images on a given website. For this purpose i need to find the ones that are within the css for example: .gk-crop { background-image: url("../images/style1/g_rss-2.png"); } Now my question is how can i get all of these urls with JSoup? so far ive tried the following: Document doc = Jsoup.connect(url).get(); Elements imagePath = doc.select("[src]"); imagePath.select("*[style*='background-image']"); but so far no luck. Does anyone know how i can acheive it? 回答1:

Output JSoup without added spaces and line breaks around the elements

大憨熊 提交于 2019-12-30 18:49:19
问题 I am parsing and outputting an xml file using JSoup (and modifying the elements in between of course). The output file has some extra spaces and line breaks. I was wondering if I can print this in the original format. Original: <attributes> <divisions>4</divisions> <key> <fifths>0</fifths> <mode>major</mode> </key> ... New: <attributes> <divisions> 4 </divisions> <key> <fifths> 0 </fifths> <mode> major </mode> </key> ... Any idea on how to remove the spaces/enters from the elements? I

JSoup not showing all the html in Java (td and tr tags missing)

匆匆过客 提交于 2019-12-30 12:38:47
问题 I'm having trouble getting all the html code under the tags. Here is my current code: Document document = Jsoup.connect("http://stackoverflow.com/questions/2971155/what-is-the-fastest-way-to-scrape-html-webpage-in-android").get(); Elements desc = document.select("tr"); System.out.println(desc.toString()); It's for that question, and I'm trying to get the text from the question's description. But I'm getting not getting certain tr or td tags like the ones for the question. Here is td tag I'm