html-parsing

DOMDocument remove script tags from HTML source

吃可爱长大的小学妹 提交于 2020-02-01 14:35:29
问题 I used @Alex's approach here to remove script tags from a HTML document using the built in DOMDocument. The problem is if I have a script tag with Javascript content and then another script tag that links to an external Javascript source file, not all script tags are removed from the HTML. $result = ' <!doctype html> <html> <head> <meta charset="utf-8"> <title> hey </title> <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script> <script>

Error using XML package in R

泪湿孤枕 提交于 2020-01-25 20:34:26
问题 I am gathering data about different universities and I have a question about the follow error after executing the following code. The problem is when using htmlParse() Code: url1 <- "http://nces.ed.gov/collegenavigator/?id=165015" webpage1<- getURL(url1) doc1 <- htmlParse(webpage1) Output: Error in htmlParse(webpage1) : File !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" html xmlns="http://www.w3.org/1999/xhtml" head id=

Parsing specific HTML tags in Javascript

丶灬走出姿态 提交于 2020-01-24 17:12:06
问题 I'm looking for the Javascript to parse the following HTML: <p>random text random text random text random text</p> <kbd><h2>Heading One</h2>Body text Body text Body text Body text</kbd> <p>random text random text random text random text</p> ... and return just: Heading One In other words, I'd like to strip all tags and Body Text from within the <kbd> tags. Any ideas would be greatly appreciated! 回答1: var input = /* that HTML string here */; var div = document.createElement('div'); div

how to filter cheerio objects in `each` with selector?

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-24 13:20:26
问题 I'm parsing a simple webpage using Cheerio and I was wandering if possible is follwing: With a html of this structure: <tr class="human"> <td class="event"><a>event1</a></td> <td class="name">name1</td> <td class="surname"><a>surname1</a></td> <td class="date">2011</td> </tr> <tr class="human"> <td class="event"><a>event2</a></td> <td class="name">name2</td> <td class="surname"><a>surname2</a></td> <td class="date">2012</td> </tr> <tr class="human"> <td class="event"><a>event3</a></td> <td

how to filter cheerio objects in `each` with selector?

坚强是说给别人听的谎言 提交于 2020-01-24 13:19:07
问题 I'm parsing a simple webpage using Cheerio and I was wandering if possible is follwing: With a html of this structure: <tr class="human"> <td class="event"><a>event1</a></td> <td class="name">name1</td> <td class="surname"><a>surname1</a></td> <td class="date">2011</td> </tr> <tr class="human"> <td class="event"><a>event2</a></td> <td class="name">name2</td> <td class="surname"><a>surname2</a></td> <td class="date">2012</td> </tr> <tr class="human"> <td class="event"><a>event3</a></td> <td

how to filter cheerio objects in `each` with selector?

冷暖自知 提交于 2020-01-24 13:19:06
问题 I'm parsing a simple webpage using Cheerio and I was wandering if possible is follwing: With a html of this structure: <tr class="human"> <td class="event"><a>event1</a></td> <td class="name">name1</td> <td class="surname"><a>surname1</a></td> <td class="date">2011</td> </tr> <tr class="human"> <td class="event"><a>event2</a></td> <td class="name">name2</td> <td class="surname"><a>surname2</a></td> <td class="date">2012</td> </tr> <tr class="human"> <td class="event"><a>event3</a></td> <td

Convert HTML table rows into PHP array and save it to database? [closed]

血红的双手。 提交于 2020-01-23 01:33:27
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . I'm trying to save a html table rows into php array and then save the array in database. <form action="" method="post"> <table class=

Using HTMLParser in Python 3.2

烂漫一生 提交于 2020-01-22 05:48:44
问题 I have been using HTML Parser to scrapping data from websites and stripping html coding whilst doing so. I'm aware of various modules such as Beautiful Soup, but decided to go down the path of not depending on "outside" modules. There is a code code supplied by Eloff: Strip HTML from strings in Python from HTMLParser import HTMLParser class MLStripper(HTMLParser): def __init__(self): self.reset() self.fed = [] def handle_data(self, d): self.fed.append(d) def get_data(self): return ''.join

Parsing web pages

耗尽温柔 提交于 2020-01-21 05:50:47
问题 I have a question about parsing HTML pages, specificaly forums, i want to parse a forum or thread containing certain post criterias, i havent defined the algorithm yet, since i have only parsed structure text formats before, A use case may be copy and paste each thread into the program by hand, or insert a URL like http://www.forums.com/forum/showthread.php?t=46875&page=3 and let the program parse the pages Given all this i would like to know: Is it possible to parse a forum thread on a HTML

POST website form data and retrieve results

寵の児 提交于 2020-01-20 07:50:55
问题 I have been trying to write a VBA code to copy these three tables as shown in the web source code below. These tables show monthly weather data. Could someone please help me write a code to copy this data and paste it in a Excel sheet? I have written a VBA code to access this data but could not copy and paste this data. Thank you very much in advance. The webpage source code: </table> </div> <hr><big><b><i>Parameters for Sizing and Pointing of Solar Panels and for Solar Thermal Applications:<