html-parsing | 易学教程

How to get the value from a specific cell C# Html-Agility-Pack

阅读更多关于 How to get the value from a specific cell C# Html-Agility-Pack

问题 How do I get a value from a specific location in the second table in the document. I need the value from the second cell down and third column over in the html document below. How do I do this. <html> <head> <title>Tables</title> </head> <body> <table border="1"> <tr> <th>Room</th> <th>Location</th> </tr> <tr> <td>Paint</td> <td>A4</td> </tr> <tr> <td>Stock</td> <td>B3</td> </tr> <tr> <td>Assy</td> <td>N9</td> </tr> </table> <p></p> <table border="1"> <tr> <th>Product</th> <th>Mat'l</th> <th

Trying to extract some data from a webpage (scraping beginner)

阅读更多关于 Trying to extract some data from a webpage (scraping beginner)

问题 I'm trying to extract some data from a webpage using Requests and then Beautifulsoup . I started by getting the html code with Requests and then "putting it" in Beautifulsoup: from bs4 import BeautifulSoup import requests result = requests.get("https://XXXXX") #print(result.status_code) #print(result.headers) src = result.content soup = BeautifulSoup(src, 'lxml') Then I singled out some pieces of code: tags = soup.findAll('ol',{'class':'activity-popup-users'}) print(tags) Here is a part of

PHP- HTML parsing :: How can be taken charset value of webpage with simple html dom parser?

阅读更多关于 PHP- HTML parsing :: How can be taken charset value of webpage with simple html dom parser?

问题 PHP:: How can be taken charset value of webpage with simple html dom parser (utf-8, windows-255, etc..)? remark: its have to be done with html dom parser http://simplehtmldom.sourceforge.net Example1 webpage charset input: <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> result:utf-8 Example2 webpage charset input: <meta content="text/html; charset=windows-255" http-equiv="Content-Type"> result:windows-255 Edit: I try this (but its not works): $html = file_get_html('http:/

get elements from html parser

阅读更多关于 get elements from html parser

问题 I'm using JSOUP, and trying to get the elements which start with a particular div tag id. For example: <div id="test123">. I need to check if the elements starts with the string "test" and get all the elements. I looked at http://jsoup.org/cookbook/extracting-data/selector-syntax and I tried a multiple variations using: doc.select("div:matches(test(*))"); But it still didn't work. Any help would be much appreciated. 回答1: Use the attribute-starts-with selector [attr^=value] . Elements elements

How to read a commented out HTML table using readHTMLTable in R

阅读更多关于 How to read a commented out HTML table using readHTMLTable in R

问题 In the past, I have been able to use readHTMLTable in R to pull some football stats. When trying to do so again this year, the tables aren't showing up, even though they are visible on the webpage. Here is an example: http://www.pro-football-reference.com/boxscores/201609080den.htm When I view the source for the page, the tables are all commented out (which I suspect is why readHTMLTable didn't find them). Example: search for "team_stats" in source code... <!-- <div class="table_outer

Jsoup - How to extract every elements

阅读更多关于 Jsoup - How to extract every elements

问题 I'm trying to get font information by using Jsoup. For an example: Below is my code: result = rtfToHtml(new StringReader(streamToString((InputStream)contents.getTransferData(dfRTF)))); // Example of text extraction from html // Parse html // String test = result.toString(); Document doc = Jsoup.parse(result); // Select first bold text String strdoc = doc.toString(); String words[] = strdoc.split("font-family"); Element firstBoldElt = doc.select("b").first(); Elements ele = doc.select("body");

Getting the website title from a link in a string

阅读更多关于 Getting the website title from a link in a string

问题 string: "Here is the badges, https://stackoverflow.com/badges bla bla bla" If string contatins a link (see above) I want to parse the website title of that link. It should return : Badges - Stack Overflow. How can i do that? Thanks. 回答1: #!/usr/bin/perl -w require LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->env_proxy; my $response = $ua->get('http://search.cpan.org/'); if ($response->is_success) { print $response->title(); } else { die $response->status_line; } See

Iterating through a DOM with BeautifulSoup/Python

阅读更多关于 Iterating through a DOM with BeautifulSoup/Python

问题 I have this DOM: <h2>Main Section</h2> <p>Bla bla bla<p> <h3>Subsection</h3> <p>Some more info</p> <h3>Subsection 2</h3> <p>Even more info!</p> <h2>Main Section 2</h2> <p>bla</p> <h3>Subsection</h3> <p>Some more info</p> <h3>Subsection 2</h3> <p>Even more info!</p> I'd like to generate an iterator that returns 'Main Section', 'Bla bla bla', 'Subsection', etc. Is there a way to this with BeautifulSoup? 回答1: Here's one way to do it. The idea is to iterate over main sections ( h2 tag) and for

how parse asp.net mvc razor view (cshtml) like html parser in c#

阅读更多关于 how parse asp.net mvc razor view (cshtml) like html parser in c#

问题 I want to parse razor view file in c# . I have also used Html Agility Pack to parse razor view file but it failed to save correct file contents. Basically i want to change some html elements inner html by server side using c# <div id="content1"> <p>this contents i want to change </p> <span>contes</span> </div> i want to change content1 inner html by c# like this <div id="content1"> <span>@Function.gethtml()</span> </div> I have used html agility pack to change inner html contents but it is

How to get text which is not part of any element using jsoup?

阅读更多关于 How to get text which is not part of any element using jsoup?

问题 How to get the text which is not part of any element? <br><b>Price:</b> Rs. 24,900.00 <br> Here, how can one get the text Rs.24,900.00 . Is this possible using jsoup? 回答1: I suppose there is a parent element so you should select that first and after just select the "b" like the following code. Basically just find the element in front of your text. Document doc = Jsoup.parse( "<br><b>Price:</b> Rs. 24,900.00 <br>"); Element el = doc.select("b").first(); String text = ((TextNode) el