html-parsing | 易学教程

POST website form data and retrieve results

阅读更多关于 POST website form data and retrieve results

问题 I have been trying to write a VBA code to copy these three tables as shown in the web source code below. These tables show monthly weather data. Could someone please help me write a code to copy this data and paste it in a Excel sheet? I have written a VBA code to access this data but could not copy and paste this data. Thank you very much in advance. The webpage source code: </table> </div> <hr><big><b><i>Parameters for Sizing and Pointing of Solar Panels and for Solar Thermal Applications:<

Parsing an html document using an XML-parser

阅读更多关于 Parsing an html document using an XML-parser

问题 Can I parse an HTML file using an XML parser? Why can('t) I do this. I know that XML is used to store data and that HTML is used to display data. But syntactically they are almost identical. The intended use is to make an HTML parser, that is part of a web crawler application 回答1: You can try parsing an HTML file using a XML parser, but it’s likely to fail. The reason is that HTML documents can have the following HTML features that XML parsers don’t understand. elements that never have end

Get generated HTML after JS manipulates the DOM and pass request headers

阅读更多关于 Get generated HTML after JS manipulates the DOM and pass request headers

问题 I need to get the generated HTML source of the page after JS DOM manipulation has all been done. I was using Phantomas https://github.com/macbre/phantomas for this purpose, but unfortunately it does not provide a way to pass in request headers. Is there a library out there that will allow to pass request headers and then get the generated HTML source code. Any pointers would be greatly helpful 回答1: You can use "PhantomJS WebKit scriptable". Specify customHeaders and get the page.content: var

How do you parse and process HTML/XML in PHP?

阅读更多关于 How do you parse and process HTML/XML in PHP?

问题 How can one parse HTML/XML and extract information from it? 回答1: Native XML Extensions I prefer using one of the native XML extensions since they come bundled with PHP, are usually faster than all the 3rd party libs and give me all the control I need over the markup. DOM The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It is an implementation of the W3C's Document Object Model Core Level 3, a platform- and language-neutral interface that allows programs

How do you parse and process HTML/XML in PHP?

阅读更多关于 How do you parse and process HTML/XML in PHP?

Access documents on secure web server

阅读更多关于 Access documents on secure web server

问题 I'm trying to build an iPad app to download and display documents (pdf, ppt, doc, etc.) from a web server. Currently it does this by parsing the HTML structure (using hpple ) on the server. For example, the files are held at: Http://myserver.com/myFolders/myFiles/ The app goes to this location and traverses the tree, using an X-Path query, e.g. "/html/body/ul/li/a" It then downloads whatever documents it finds to the iPad for display. So far this works quite well but the server is publicly

jsoup times out, xml gets white space error, basic traversing through page is time consuming

阅读更多关于 jsoup times out, xml gets white space error, basic traversing through page is time consuming

问题 I would like to make a program that parses the html page and selects useful information and displays it. I did it by opening a stream and then line by line searching for this appropriate content, but this is a time consuming process. So then I decided to do it by treating it as a xml and then using xpath. This I did by making a xml file on my system and loading the contents from the stream, and I got white space error, then I decide to direct open document as doc = (Document) builder.parse

HTML parsing error in IE8(KB927917)

阅读更多关于 HTML parsing error in IE8(KB927917)

问题 Webpage error details User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727) Timestamp: Wed, 18 Jan 2012 05:02:49 UTC Message: HTML Parsing Error: Unable to modify the parent container element before the child element is closed (KB927917) Line: 0 Char: 0 Code: 0 URI: http://collaborize.collaborizeclassroom.com/portal/portal/collaborize/site/window?actionEvent=homePage&action=2&fpg=1&unId=umb8N95lhIoXOVKzTTrtcPoCrixd4wMdScQv8mEwqFT962zy3VSh4mzQNeugOWVV

meta tag parsing in Rails [closed]

阅读更多关于 meta tag parsing in Rails [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I was looking for something to help me parse general meta-tags from websites similar to this github project I found for open graph data. Here's a demo app. Basically, I'd like to be able to have a user input a URL from a news site and have it retrieve from that the Title, Desc, etc., leaving as little work

Why does BeautifulSoup .children contain nameless elements as well as the expected tag(s)

阅读更多关于 Why does BeautifulSoup .children contain nameless elements as well as the expected tag(s)

问题 Code #!/usr/bin/env python3 from bs4 import BeautifulSoup test="""<!DOCTYPE html> <html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/> <title>Test</title> </head> <body> <table> <tbody> <tr> <td> <div> <b> Icon </b> </div> </td> </tr> </tbody> </table> </body> </html>""" soup = BeautifulSoup(test2) rows = soup.findAll('tr') for r in rows: print(r.name) for c in r.children: print('>', c.name) Output tr > None > td > None Why are there nameless elements in the list