html-parsing

Parse HTML with Swiftsoup (Swift)?

蓝咒 提交于 2021-02-10 06:41:40
问题 I'm trying to parse some websites with Swiftsoup, let's say one of the websites is from Medium. How can I extract the body of the website and load the body to another UIViewController like what Instapaper does? Here is the code I use to extract the title: import SwiftSoup class WebViewController: UIViewController, UIWebViewDelegate { ... override func viewDidLoad() { super.viewDidLoad() let url = URL(string: "https://medium.com/@timjwise/stop-lying-to-yourself-when-you-snub-panhandlers-its

Java program to download images from a website and display the file sizes

北城以北 提交于 2021-02-09 11:12:15
问题 I'm creating a java program that will read a html document from a URL and display the sizes of the images in the code. I'm not sure how to go about achieving this though. I wouldn't need to actually download and save the images, i just need the sizes and the order in which they appear on the webpage. for example: a webpage has 3 images <img src="dog.jpg" /> //which is 54kb <img src="cat.jpg" /> //which is 75kb <img src="horse.jpg"/> //which is 80kb i would need the output of my java program

Java program to download images from a website and display the file sizes

喜夏-厌秋 提交于 2021-02-09 11:08:28
问题 I'm creating a java program that will read a html document from a URL and display the sizes of the images in the code. I'm not sure how to go about achieving this though. I wouldn't need to actually download and save the images, i just need the sizes and the order in which they appear on the webpage. for example: a webpage has 3 images <img src="dog.jpg" /> //which is 54kb <img src="cat.jpg" /> //which is 75kb <img src="horse.jpg"/> //which is 80kb i would need the output of my java program

How to Get Script Tag Variables From a Website using Python

谁说胖子不能爱 提交于 2021-02-08 10:03:55
问题 I am trying to pull a variable called meta in a script tag using Python. I have used selenium to do this before, but selenium is too slow for what I am trying to accomplish. Is there any other way of doing this. I have tried using BeautifulSoup, but I'm stuck... code is below Here is the script tag I'm trying to get the meta variable from: <script>window.ShopifyAnalytics = window.ShopifyAnalytics || {}; window.ShopifyAnalytics.meta = window.ShopifyAnalytics.meta || {}; window.ShopifyAnalytics

Use Pandas to Get Multiple Tables From Webpage

柔情痞子 提交于 2021-02-08 09:57:32
问题 I am using Pandas to parse the data from the following page: http://kenpom.com/index.php?y=2014 To get the data, I am writing: dfs = pd.read_html(url) The data looks great and is perfectly parsed, except it only takes data from the 40 first rows. It seems to be a problem with the separation of the tables, that makes it so that pandas does no get all the information. How do you get pandas to get all the data from all the tables on that webpage? 回答1: The HTML of page you have posted have

Use Pandas to Get Multiple Tables From Webpage

安稳与你 提交于 2021-02-08 09:56:49
问题 I am using Pandas to parse the data from the following page: http://kenpom.com/index.php?y=2014 To get the data, I am writing: dfs = pd.read_html(url) The data looks great and is perfectly parsed, except it only takes data from the 40 first rows. It seems to be a problem with the separation of the tables, that makes it so that pandas does no get all the information. How do you get pandas to get all the data from all the tables on that webpage? 回答1: The HTML of page you have posted have

php extract body tag content

删除回忆录丶 提交于 2021-02-08 03:33:25
问题 I'm trying what should be very easy, but I can't get it to work. Which makes me wonder if I'm using the right workflow. I have a simple html page which I load in my desktop application as a help file. This page has no menu just the content. On my website I want to have a more sophisticated help system. So I want to use a php file which will show a menu, breadcrums and a header and footer. To not duplicate my help content I want to load the original HTML help file and add its body content to

How do I get all text from within this tag?

我是研究僧i 提交于 2021-02-07 22:39:18
问题 I'm trying to get all text from within this HTML tag, which I store in variable tag : <td rowspan="2" style="text-align: center;"><a href="/wiki/Glenn_Miller" title="Glenn Miller">Glenn Miller</a> & His Orchestra</td> The result should be "Glenn Miller & His Orchestra" . But print ing tag.find(text=True) returns this: "Glenn Miller" . How do I get the rest of the text within the td element? 回答1: tag.find(text=True) would return the first matching text node . Use .get_text() instead: >>> from

How do I get all text from within this tag?

我的未来我决定 提交于 2021-02-07 22:37:11
问题 I'm trying to get all text from within this HTML tag, which I store in variable tag : <td rowspan="2" style="text-align: center;"><a href="/wiki/Glenn_Miller" title="Glenn Miller">Glenn Miller</a> & His Orchestra</td> The result should be "Glenn Miller & His Orchestra" . But print ing tag.find(text=True) returns this: "Glenn Miller" . How do I get the rest of the text within the td element? 回答1: tag.find(text=True) would return the first matching text node . Use .get_text() instead: >>> from

Extract JSON object from html using PHP regex

耗尽温柔 提交于 2021-02-07 18:12:51
问题 After reading all related threads i can not find anything that shows regex that is capable of extracting a full json object from within html content so im hoping someone can help me get the right regex to resolve the issue. For example the json im looking to extract looks like this: "taxonomy": {"page":"/products/1/","price":"350.00","country_code":"gb","brand":"apple"}, Im trying to extract the entire "taxonomy" object that is inside a java script function within the html. I have tried preg