html-parsing

How to get the value from a specific cell C# Html-Agility-Pack

你。 提交于 2019-12-11 04:33:35
问题 How do I get a value from a specific location in the second table in the document. I need the value from the second cell down and third column over in the html document below. How do I do this. <html> <head> <title>Tables</title> </head> <body> <table border="1"> <tr> <th>Room</th> <th>Location</th> </tr> <tr> <td>Paint</td> <td>A4</td> </tr> <tr> <td>Stock</td> <td>B3</td> </tr> <tr> <td>Assy</td> <td>N9</td> </tr> </table> <p></p> <table border="1"> <tr> <th>Product</th> <th>Mat'l</th> <th

Trying to extract some data from a webpage (scraping beginner)

▼魔方 西西 提交于 2019-12-11 04:14:17
问题 I'm trying to extract some data from a webpage using Requests and then Beautifulsoup . I started by getting the html code with Requests and then "putting it" in Beautifulsoup: from bs4 import BeautifulSoup import requests result = requests.get("https://XXXXX") #print(result.status_code) #print(result.headers) src = result.content soup = BeautifulSoup(src, 'lxml') Then I singled out some pieces of code: tags = soup.findAll('ol',{'class':'activity-popup-users'}) print(tags) Here is a part of

PHP- HTML parsing :: How can be taken charset value of webpage with simple html dom parser?

和自甴很熟 提交于 2019-12-11 04:00:58
问题 PHP:: How can be taken charset value of webpage with simple html dom parser (utf-8, windows-255, etc..)? remark: its have to be done with html dom parser http://simplehtmldom.sourceforge.net Example1 webpage charset input: <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> result:utf-8 Example2 webpage charset input: <meta content="text/html; charset=windows-255" http-equiv="Content-Type"> result:windows-255 Edit: I try this (but its not works): $html = file_get_html('http:/

get elements from html parser

隐身守侯 提交于 2019-12-11 03:59:30
问题 I'm using JSOUP, and trying to get the elements which start with a particular div tag id. For example: <div id="test123">. I need to check if the elements starts with the string "test" and get all the elements. I looked at http://jsoup.org/cookbook/extracting-data/selector-syntax and I tried a multiple variations using: doc.select("div:matches(test(*))"); But it still didn't work. Any help would be much appreciated. 回答1: Use the attribute-starts-with selector [attr^=value] . Elements elements

How to read a commented out HTML table using readHTMLTable in R

白昼怎懂夜的黑 提交于 2019-12-11 03:44:26
问题 In the past, I have been able to use readHTMLTable in R to pull some football stats. When trying to do so again this year, the tables aren't showing up, even though they are visible on the webpage. Here is an example: http://www.pro-football-reference.com/boxscores/201609080den.htm When I view the source for the page, the tables are all commented out (which I suspect is why readHTMLTable didn't find them). Example: search for "team_stats" in source code... <!-- <div class="table_outer

Jsoup - How to extract every elements

安稳与你 提交于 2019-12-11 03:43:40
问题 I'm trying to get font information by using Jsoup. For an example: Below is my code: result = rtfToHtml(new StringReader(streamToString((InputStream)contents.getTransferData(dfRTF)))); // Example of text extraction from html // Parse html // String test = result.toString(); Document doc = Jsoup.parse(result); // Select first bold text String strdoc = doc.toString(); String words[] = strdoc.split("font-family"); Element firstBoldElt = doc.select("b").first(); Elements ele = doc.select("body");

Getting the website title from a link in a string

时光怂恿深爱的人放手 提交于 2019-12-11 03:38:55
问题 string: "Here is the badges, https://stackoverflow.com/badges bla bla bla" If string contatins a link (see above) I want to parse the website title of that link. It should return : Badges - Stack Overflow. How can i do that? Thanks. 回答1: #!/usr/bin/perl -w require LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->env_proxy; my $response = $ua->get('http://search.cpan.org/'); if ($response->is_success) { print $response->title(); } else { die $response->status_line; } See

Iterating through a DOM with BeautifulSoup/Python

冷暖自知 提交于 2019-12-11 03:38:54
问题 I have this DOM: <h2>Main Section</h2> <p>Bla bla bla<p> <h3>Subsection</h3> <p>Some more info</p> <h3>Subsection 2</h3> <p>Even more info!</p> <h2>Main Section 2</h2> <p>bla</p> <h3>Subsection</h3> <p>Some more info</p> <h3>Subsection 2</h3> <p>Even more info!</p> I'd like to generate an iterator that returns 'Main Section', 'Bla bla bla', 'Subsection', etc. Is there a way to this with BeautifulSoup? 回答1: Here's one way to do it. The idea is to iterate over main sections ( h2 tag) and for

how parse asp.net mvc razor view (cshtml) like html parser in c#

五迷三道 提交于 2019-12-11 03:36:23
问题 I want to parse razor view file in c# . I have also used Html Agility Pack to parse razor view file but it failed to save correct file contents. Basically i want to change some html elements inner html by server side using c# <div id="content1"> <p>this contents i want to change </p> <span>contes</span> </div> i want to change content1 inner html by c# like this <div id="content1"> <span>@Function.gethtml()</span> </div> I have used html agility pack to change inner html contents but it is

How to get text which is not part of any element using jsoup?

混江龙づ霸主 提交于 2019-12-11 03:17:45
问题 How to get the text which is not part of any element? <br><b>Price:</b>   Rs. 24,900.00   <br> Here, how can one get the text Rs.24,900.00 . Is this possible using jsoup? 回答1: I suppose there is a parent element so you should select that first and after just select the "b" like the following code. Basically just find the element in front of your text. Document doc = Jsoup.parse( "<br><b>Price:</b>   Rs. 24,900.00   <br>"); Element el = doc.select("b").first(); String text = ((TextNode) el