html-parsing

R readHTMLTable() function error

心已入冬 提交于 2019-12-25 04:07:38
问题 I'm running into a problem when trying to use the readHTMLTable function in the R package XML. When running library(XML) baseurl <- "http://www.pro-football-reference.com/teams/" team <- "nwe" year <- 2011 theurl <- paste(baseurl,team,"/",year,".htm",sep="") readurl <- getURL(theurl) readtable <- readHTMLTable(readurl) I get the error message: Error in names(ans) = header : 'names' attribute [27] must be the same length as the vector [21] I'm running 64 bit R 2.15.1 through R Studio 0.96.330.

beautifulsoup with an invalid html document

痞子三分冷 提交于 2019-12-25 02:27:13
问题 I am trying to parse the document http://www.consilium.europa.eu/uedocs/cms_data/docs/pressdata/en/ecofin/5923en8.htm. I want to extract everything before Commission: . ( I need Beautifulsoup because the second step is to extract countries and person names ) If i do: import urllib import re from bs4 import BeautifulSoup url="http://www.consilium.europa.eu/uedocs/cms_data/docs/pressdata/en/ecofin/5923en8.htm" soup=BeautifulSoup(urllib.urlopen(url)) print soup.find_all(text=re.compile(

Search block of text, return MP3 links using PHP

自作多情 提交于 2019-12-25 01:50:46
问题 I've just run into a little bit of trouble with some PHP on my latest project. Basically I have a block of text ( $text ) and I would like to search through that text and return all of the MP3 links. I know it has something to do with regular expressions but I just cannot get it working. Here's my current code: if(preg_match_all(".mp3", $text, $matches, PREG_SET_ORDER)) { foreach($matches as $match) { echo $match[2]; echo $text; } } 回答1: Once again, regex is extremely poor at parsing HTML.

Parse a HTML combox in C#

纵饮孤独 提交于 2019-12-25 01:46:11
问题 I need parse a select value in html file. I have this html file: <html> <head></head> <body> <select id="region" name="region"> <option value="0" selected>Všetky regiony</option> <optgroup>Banskobystrický kraj</optgroup> <option value="k_1">Banskobystrický kraj</option> <option value="1">Banská Bystrica</option> <option value="3">Banská Štiavnica</option> <option value="18">Brezno</option> <option value="22">Detva</option> <option value="58">Dudince</option> </select> </body> </html> I need

php DOMDocument class: node tree

不问归期 提交于 2019-12-25 01:37:37
问题 I want to convert html syntax into a node tree ( <ul> structure). How do I do this using the DOMDocument class? $html = ' <div> <p> <a> </p> </div> '; result: <ul> <li> div <ul> <li> p <ul> <li>a</li> </ul> </li> </ul> </li> </ul> 回答1: <?php $xml = '<div> <p> <a>#</a> </p> </div> '; function xml2array($xml,&$result = '') { foreach($xml->children() as $name => $xmlchild) { xml2array($xmlchild,$result); } $result = "<ul><li>".$xml->getName().$result."</li></ul>"; } $result=''; $dd = xml2array

HTML speacial character parsing

筅森魡賤 提交于 2019-12-25 01:30:05
问题 I'm looking for a java class to parse all HTML special characters. I guess it's a common problem but i cannot find a fast solution right now. What i wanto to get is: input: thè --> output: thè input: » input: &lraquo; ... Do you know anything useful for me? 回答1: Try the StringEscapeUtils utility class. Check the docs for the StringEscapeUtils.unescapeHtml() method. Docs here: http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html Download here: http:/

Can we create an Image by HTML string in Ruby On Rails.

不想你离开。 提交于 2019-12-25 00:27:53
问题 I would like to know , Is there any possibility by which we can create an image by the HTML sting that has the HTML tags along with the formatting Or the HTML content coming from the Web Editors like Ckeditor or TinyMce etc in Ruby on Rails. Thanks Nishant 回答1: Are you referring to achieving: <%= "<img src='http://domain.tld/some_image.png' />".html_safe %> You can also interpolate any strings by doing <%= "#{url_string}".html_safe %> where url_string = "<img src='http://domain.tld/some_image

Html Dom parser get first element

房东的猫 提交于 2019-12-24 22:01:26
问题 Hi i'm using simple_html_dom php library to get contents from other website. I have below html structure, <h1 class="nik_product_title" style="color: #000;"> DSLR D7100 <span class="new_big_parent"> <span class="new_big_child"> <span class="new_big_child1">new</span> </span> </span> </h1> Using this @$html->find ( 'div[class=nik_block_product_main_info_component_inner] h1',0)->plaintext; But i'm getting output as DSLR+D7100new How to get only first plain text i.e, need to fetch only DSLR

parsing simple html for iphone

ぐ巨炮叔叔 提交于 2019-12-24 19:55:22
问题 I have a very simple html page to parse. The html page will remain simple always. as simple as this <html> <head><title>title</title></head> <body>some data here</body> </html> I have fetched the html content of such an html page and have it in an NSString. I want to get what ever data is there in the body of the html page. Please tell me how can this be done and let me know if there are more than one possible ways. I would prefer doing it using basic obj-c if it is possible. Thanks 回答1: If

HTML find and replace href tags [duplicate]

左心房为你撑大大i 提交于 2019-12-24 18:20:26
问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: What is the best way to parse html in C#? I am parsing an HTML file. I need find all the href tags in an html and replace them with a text friendly version. Here is an example. Original Text: <a href="http://foo.bar">click here</a> replacement value: click here <http://foo.bar> How do I achieve this? 回答1: You could use the Html Agility Pack library, with a code like this: HtmlDocument doc = new HtmlDocument();