html-parsing

Extract JSON object from html using PHP regex

ⅰ亾dé卋堺 提交于 2021-02-07 18:11:28
问题 After reading all related threads i can not find anything that shows regex that is capable of extracting a full json object from within html content so im hoping someone can help me get the right regex to resolve the issue. For example the json im looking to extract looks like this: "taxonomy": {"page":"/products/1/","price":"350.00","country_code":"gb","brand":"apple"}, Im trying to extract the entire "taxonomy" object that is inside a java script function within the html. I have tried preg

Extract JSON object from html using PHP regex

会有一股神秘感。 提交于 2021-02-07 18:08:27
问题 After reading all related threads i can not find anything that shows regex that is capable of extracting a full json object from within html content so im hoping someone can help me get the right regex to resolve the issue. For example the json im looking to extract looks like this: "taxonomy": {"page":"/products/1/","price":"350.00","country_code":"gb","brand":"apple"}, Im trying to extract the entire "taxonomy" object that is inside a java script function within the html. I have tried preg

How to extract info from HTML with Java's own Parser?

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-07 09:57:57
问题 I don't want to download any other libraries, i'm talking about this one: javax.swing.text.html.HTMLEditorKit.Parser How can I extract repeated information within a page using this parser? Say for example I have this code repeated in a page: <tr> <td class="info1">get this info</td> <td class="info2">get this info</td> <td class="info3">get this info</td> </tr> Can I have any example code please? Thanks in advance. 回答1: It's a stream parser, so as it parses it tells you what it hits. You

Parsing HTML does not output desired data(tracking info for FedEx)

梦想的初衷 提交于 2021-02-07 09:00:34
问题 Im trying to make a script that grabs tracking information from fedex website. I figured that f i just go to the url 'https://www.fedex.com/fedextrack/?tracknumbers=' and paste the tracking number at the end of it, it brings me to the tracking page which has the information i need. I tried to feed the URL the tracking number and parse the html from the response. This is what I tried. import urllib url_prefix= 'https://www.fedex.com/fedextrack/?tracknumbers=' tracking_number = '570573906561'

Parsing HTML does not output desired data(tracking info for FedEx)

别等时光非礼了梦想. 提交于 2021-02-07 09:00:23
问题 Im trying to make a script that grabs tracking information from fedex website. I figured that f i just go to the url 'https://www.fedex.com/fedextrack/?tracknumbers=' and paste the tracking number at the end of it, it brings me to the tracking page which has the information i need. I tried to feed the URL the tracking number and parse the html from the response. This is what I tried. import urllib url_prefix= 'https://www.fedex.com/fedextrack/?tracknumbers=' tracking_number = '570573906561'

How do I parse an HTML website using Perl? [closed]

谁说我不能喝 提交于 2021-02-04 16:20:07
问题 Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . Improve this question Could you please give me some suggestions on how to parse HTML in Perl? I plan to parse the keywords (including URL links) and save them to a MySQL database. I am using Windows XP. Also, do I first need to download some website pages to the local hard drive with

Fastest, easiest, and best way to parse an HTML table?

╄→гoц情女王★ 提交于 2021-02-04 14:53:09
问题 I'm trying to get this table http://www.datamystic.com/timezone/time_zones.html into array format so I can do whatever I want with it. Preferably in PHP, python or JavaScript. This is the kind of problem that comes up a lot, so rather than looking for help with this specific problem, I'm looking for ideas on how to solve all similar problems. BeautifulSoup is the first thing that comes to mind. Another possibility is copying/pasting it in TextMate and then running regular expressions. What do

Exclude non wanted html from Simple Html Dom - PHP

笑着哭i 提交于 2021-01-29 13:25:26
问题 I am using HTML Simple Dom Parser with PHP to get title, description and images from a website. The issue I am facing is I am getting the html which I dont want and how to exclude those html tags. Below is the explanation. Here is a sample html structure which is being parsed. <div id="product_description"> <p> Some text</p> <ul> <li>value 1</li> <li>value 2</li> <li>value 3</li> </ul> // the div I dont want <div id="comments"> <h1> Some Text </h1> </div> </div> I am using below php script to

Beautiful Soup returns 'none'

怎甘沉沦 提交于 2021-01-29 11:24:53
问题 I am using the following code to extract data using beautiful soup: import requests import bs4 res = requests.get('https://www.jmu.edu/cgi-bin/parking_sign_data.cgi?hash=53616c7465645f5f5c0bbd0eccccb6fe8dd7ed9a0445247e3c7dcb4f91927f7ccc933be780c6e558afb8ebf73620c3e5e3b2c68cd3c138519068eac99d9bf30e1e67ce894deb3a054f95f882da2ea2f0|869835tg89dhkdnbnsv5sg5wg0vmcf4mfcfc2qwm5968unmeh5') soup = bs4.BeautifulSoup(res.text, 'xml') soup.find_all("span", class_="text") I've tried different variations of

BeautifulSoup not extracting div properly

安稳与你 提交于 2021-01-29 06:44:11
问题 BeautifulSoup is not extracting the div I want properly. I am not sure what I am doing wrong. Here is the html: <div id='display'> <div class='result'> <div>text0 </p></div> <div>text1</div> <div>text2</div> </div> </div> And here is my code: div = soup.find("div", {"class": "result"}) print(div) I am seeing this: <div class="result"> <div>text0 </div></div> What I am expecting is this: <div class="result"> <div>text0</div> <div>text1</div> <div>text2</div> </div> This works as expected if I