html-parsing | 易学教程

R readHTMLTable() function error

阅读更多关于 R readHTMLTable() function error

问题 I'm running into a problem when trying to use the readHTMLTable function in the R package XML. When running library(XML) baseurl <- "http://www.pro-football-reference.com/teams/" team <- "nwe" year <- 2011 theurl <- paste(baseurl,team,"/",year,".htm",sep="") readurl <- getURL(theurl) readtable <- readHTMLTable(readurl) I get the error message: Error in names(ans) = header : 'names' attribute [27] must be the same length as the vector [21] I'm running 64 bit R 2.15.1 through R Studio 0.96.330.

beautifulsoup with an invalid html document

阅读更多关于 beautifulsoup with an invalid html document

问题 I am trying to parse the document http://www.consilium.europa.eu/uedocs/cms_data/docs/pressdata/en/ecofin/5923en8.htm. I want to extract everything before Commission: . ( I need Beautifulsoup because the second step is to extract countries and person names ) If i do: import urllib import re from bs4 import BeautifulSoup url="http://www.consilium.europa.eu/uedocs/cms_data/docs/pressdata/en/ecofin/5923en8.htm" soup=BeautifulSoup(urllib.urlopen(url)) print soup.find_all(text=re.compile(

Search block of text, return MP3 links using PHP

阅读更多关于 Search block of text, return MP3 links using PHP

问题 I've just run into a little bit of trouble with some PHP on my latest project. Basically I have a block of text ( $text ) and I would like to search through that text and return all of the MP3 links. I know it has something to do with regular expressions but I just cannot get it working. Here's my current code: if(preg_match_all(".mp3", $text, $matches, PREG_SET_ORDER)) { foreach($matches as $match) { echo $match[2]; echo $text; } } 回答1: Once again, regex is extremely poor at parsing HTML.

Parse a HTML combox in C#

阅读更多关于 Parse a HTML combox in C#

问题 I need parse a select value in html file. I have this html file: <html> <head></head> <body> <select id="region" name="region"> <option value="0" selected>Všetky regiony</option> <optgroup>Banskobystrický kraj</optgroup> <option value="k_1">Banskobystrický kraj</option> <option value="1">Banská Bystrica</option> <option value="3">Banská Štiavnica</option> <option value="18">Brezno</option> <option value="22">Detva</option> <option value="58">Dudince</option> </select> </body> </html> I need

php DOMDocument class: node tree

阅读更多关于 php DOMDocument class: node tree

问题 I want to convert html syntax into a node tree ( <ul> structure). How do I do this using the DOMDocument class? $html = ' <div> <p> <a> </p> </div> '; result: <ul> <li> div <ul> <li> p <ul> <li>a</li> </ul> </li> </ul> </li> </ul> 回答1: <?php $xml = '<div> <p> <a>#</a> </p> </div> '; function xml2array($xml,&$result = '') { foreach($xml->children() as $name => $xmlchild) { xml2array($xmlchild,$result); } $result = "<ul><li>".$xml->getName().$result."</li></ul>"; } $result=''; $dd = xml2array

HTML speacial character parsing

阅读更多关于 HTML speacial character parsing

问题 I'm looking for a java class to parse all HTML special characters. I guess it's a common problem but i cannot find a fast solution right now. What i wanto to get is: input: thè --> output: thè input: » input: &lraquo; ... Do you know anything useful for me? 回答1: Try the StringEscapeUtils utility class. Check the docs for the StringEscapeUtils.unescapeHtml() method. Docs here: http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html Download here: http:/

Can we create an Image by HTML string in Ruby On Rails.

阅读更多关于 Can we create an Image by HTML string in Ruby On Rails.

问题 I would like to know , Is there any possibility by which we can create an image by the HTML sting that has the HTML tags along with the formatting Or the HTML content coming from the Web Editors like Ckeditor or TinyMce etc in Ruby on Rails. Thanks Nishant 回答1: Are you referring to achieving: <%= "<img src='http://domain.tld/some_image.png' />".html_safe %> You can also interpolate any strings by doing <%= "#{url_string}".html_safe %> where url_string = "<img src='http://domain.tld/some_image

Html Dom parser get first element

阅读更多关于 Html Dom parser get first element

问题 Hi i'm using simple_html_dom php library to get contents from other website. I have below html structure, <h1 class="nik_product_title" style="color: #000;"> DSLR D7100 <span class="new_big_parent"> <span class="new_big_child"> <span class="new_big_child1">new</span> </span> </span> </h1> Using this @$html->find ( 'div[class=nik_block_product_main_info_component_inner] h1',0)->plaintext; But i'm getting output as DSLR+D7100new How to get only first plain text i.e, need to fetch only DSLR

parsing simple html for iphone

阅读更多关于 parsing simple html for iphone

问题 I have a very simple html page to parse. The html page will remain simple always. as simple as this <html> <head><title>title</title></head> <body>some data here</body> </html> I have fetched the html content of such an html page and have it in an NSString. I want to get what ever data is there in the body of the html page. Please tell me how can this be done and let me know if there are more than one possible ways. I would prefer doing it using basic obj-c if it is possible. Thanks 回答1: If

HTML find and replace href tags [duplicate]

阅读更多关于 HTML find and replace href tags [duplicate]

问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: What is the best way to parse html in C#? I am parsing an HTML file. I need find all the href tags in an html and replace them with a text friendly version. Here is an example. Original Text: <a href="http://foo.bar">click here</a> replacement value: click here <http://foo.bar> How do I achieve this? 回答1: You could use the Html Agility Pack library, with a code like this: HtmlDocument doc = new HtmlDocument();