html-parsing

POST website form data and retrieve results

℡╲_俬逩灬. 提交于 2020-01-20 07:48:25
问题 I have been trying to write a VBA code to copy these three tables as shown in the web source code below. These tables show monthly weather data. Could someone please help me write a code to copy this data and paste it in a Excel sheet? I have written a VBA code to access this data but could not copy and paste this data. Thank you very much in advance. The webpage source code: </table> </div> <hr><big><b><i>Parameters for Sizing and Pointing of Solar Panels and for Solar Thermal Applications:<

Parsing an html document using an XML-parser

浪尽此生 提交于 2020-01-20 05:48:08
问题 Can I parse an HTML file using an XML parser? Why can('t) I do this. I know that XML is used to store data and that HTML is used to display data. But syntactically they are almost identical. The intended use is to make an HTML parser, that is part of a web crawler application 回答1: You can try parsing an HTML file using a XML parser, but it’s likely to fail. The reason is that HTML documents can have the following HTML features that XML parsers don’t understand. elements that never have end

Get generated HTML after JS manipulates the DOM and pass request headers

老子叫甜甜 提交于 2020-01-17 09:01:10
问题 I need to get the generated HTML source of the page after JS DOM manipulation has all been done. I was using Phantomas https://github.com/macbre/phantomas for this purpose, but unfortunately it does not provide a way to pass in request headers. Is there a library out there that will allow to pass request headers and then get the generated HTML source code. Any pointers would be greatly helpful 回答1: You can use "PhantomJS WebKit scriptable". Specify customHeaders and get the page.content: var

How do you parse and process HTML/XML in PHP?

元气小坏坏 提交于 2020-01-17 04:13:51
问题 How can one parse HTML/XML and extract information from it? 回答1: Native XML Extensions I prefer using one of the native XML extensions since they come bundled with PHP, are usually faster than all the 3rd party libs and give me all the control I need over the markup. DOM The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It is an implementation of the W3C's Document Object Model Core Level 3, a platform- and language-neutral interface that allows programs

How do you parse and process HTML/XML in PHP?

余生颓废 提交于 2020-01-17 04:12:10
问题 How can one parse HTML/XML and extract information from it? 回答1: Native XML Extensions I prefer using one of the native XML extensions since they come bundled with PHP, are usually faster than all the 3rd party libs and give me all the control I need over the markup. DOM The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It is an implementation of the W3C's Document Object Model Core Level 3, a platform- and language-neutral interface that allows programs

Access documents on secure web server

时间秒杀一切 提交于 2020-01-17 03:36:31
问题 I'm trying to build an iPad app to download and display documents (pdf, ppt, doc, etc.) from a web server. Currently it does this by parsing the HTML structure (using hpple ) on the server. For example, the files are held at: Http://myserver.com/myFolders/myFiles/ The app goes to this location and traverses the tree, using an X-Path query, e.g. "/html/body/ul/li/a" It then downloads whatever documents it finds to the iPad for display. So far this works quite well but the server is publicly

jsoup times out, xml gets white space error, basic traversing through page is time consuming

五迷三道 提交于 2020-01-16 08:40:48
问题 I would like to make a program that parses the html page and selects useful information and displays it. I did it by opening a stream and then line by line searching for this appropriate content, but this is a time consuming process. So then I decided to do it by treating it as a xml and then using xpath. This I did by making a xml file on my system and loading the contents from the stream, and I got white space error, then I decide to direct open document as doc = (Document) builder.parse

HTML parsing error in IE8(KB927917)

大城市里の小女人 提交于 2020-01-15 07:27:12
问题 Webpage error details User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727) Timestamp: Wed, 18 Jan 2012 05:02:49 UTC Message: HTML Parsing Error: Unable to modify the parent container element before the child element is closed (KB927917) Line: 0 Char: 0 Code: 0 URI: http://collaborize.collaborizeclassroom.com/portal/portal/collaborize/site/window?actionEvent=homePage&action=2&fpg=1&unId=umb8N95lhIoXOVKzTTrtcPoCrixd4wMdScQv8mEwqFT962zy3VSh4mzQNeugOWVV

meta tag parsing in Rails [closed]

自作多情 提交于 2020-01-15 05:47:33
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I was looking for something to help me parse general meta-tags from websites similar to this github project I found for open graph data. Here's a demo app. Basically, I'd like to be able to have a user input a URL from a news site and have it retrieve from that the Title, Desc, etc., leaving as little work

Why does BeautifulSoup .children contain nameless elements as well as the expected tag(s)

折月煮酒 提交于 2020-01-15 03:16:29
问题 Code #!/usr/bin/env python3 from bs4 import BeautifulSoup test="""<!DOCTYPE html> <html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/> <title>Test</title> </head> <body> <table> <tbody> <tr> <td> <div> <b> Icon </b> </div> </td> </tr> </tbody> </table> </body> </html>""" soup = BeautifulSoup(test2) rows = soup.findAll('tr') for r in rows: print(r.name) for c in r.children: print('>', c.name) Output tr > None > td > None Why are there nameless elements in the list