beautifulsoup

How to add proxies to BeautifulSoup crawler

痴心易碎 提交于 2020-03-17 12:07:52
问题 These are the definitions in the python crawler: from __future__ import with_statement from eventlet.green import urllib2 import eventlet import re import urlparse from bs4 import BeautifulSoup, SoupStrainer import sqlite3 import datetime How to I add a rotating proxy (one proxy per open thread) to a recursive cralwer working on BeautifulSoup? I know how to add proxies if I was using Mechanise's browser: br = Browser() br.set_proxies({'http':'http://username:password@proxy:port', 'https':

How to extract the Coronavirus cases from a website?

耗尽温柔 提交于 2020-03-15 09:38:42
问题 I'm trying to extract the Coronavirus from a website (https://www.trackcorona.live) but I got an error. This is my code: response = requests.get('https://www.trackcorona.live') data = BeautifulSoup(response.text,'html.parser') li = data.find_all(class_='numbers') confirmed = int(li[0].get_text()) print('Confirmed Cases:', confirmed) It gives the following error (though it was working few days back) because it is returning an empty list (li) IndexError Traceback (most recent call last)

Get immediate parent tag with BeautifulSoup in Python

妖精的绣舞 提交于 2020-03-13 04:21:24
问题 I've researched this question but haven't seen an actual solution to solving this. I'm using BeautifulSoup with Python and what I'm looking to do is get all image tags from a page, loop through each and check each to see if it's immediate parent is an anchor tag. Here's some pseudo code: html = BeautifulSoup(responseHtml) for image in html.findAll('img'): if (image.parent.name == 'a'): image.hasParent = image.parent.link Any ideas on this? 回答1: You need to check parent's name: for img in soup

Get immediate parent tag with BeautifulSoup in Python

一曲冷凌霜 提交于 2020-03-13 04:20:07
问题 I've researched this question but haven't seen an actual solution to solving this. I'm using BeautifulSoup with Python and what I'm looking to do is get all image tags from a page, loop through each and check each to see if it's immediate parent is an anchor tag. Here's some pseudo code: html = BeautifulSoup(responseHtml) for image in html.findAll('img'): if (image.parent.name == 'a'): image.hasParent = image.parent.link Any ideas on this? 回答1: You need to check parent's name: for img in soup

Get immediate parent tag with BeautifulSoup in Python

会有一股神秘感。 提交于 2020-03-13 04:18:12
问题 I've researched this question but haven't seen an actual solution to solving this. I'm using BeautifulSoup with Python and what I'm looking to do is get all image tags from a page, loop through each and check each to see if it's immediate parent is an anchor tag. Here's some pseudo code: html = BeautifulSoup(responseHtml) for image in html.findAll('img'): if (image.parent.name == 'a'): image.hasParent = image.parent.link Any ideas on this? 回答1: You need to check parent's name: for img in soup

python相关资料

守給你的承諾、 提交于 2020-03-12 16:45:18
Python3: https://www.liaoxuefeng.com/wiki/1016959663602400 Flask: https://dormousehole.readthedocs.io/en/latest/ Flask源码: https://github.com/pallets/flask (含例子) Flask-SQLAlchemy: http://www.pythondoc.com/flask-sqlalchemy/ Beautiful Soup: https://beautifulsoup.readthedocs.io/zh_CN/latest/ 来源: oschina 链接: https://my.oschina.net/u/2400070/blog/3192594

How to scrape phone no using python when it show after clicked

烂漫一生 提交于 2020-03-12 06:46:08
问题 I want to scrape phone no but phone no only displays after clicked so please is it possible to scrape phone no directly using python?My code scrape phone no but with starr***. here is the link from where I want to scrape phone no:https://hipages.com.au/connect/abcelectricservicespl/service/126298 please guide me! here is my code: import requests from bs4 import BeautifulSoup def get_page(url): response = requests.get(url) if not response.ok: print('server responded:', response.status_code)

How to scrape phone no using python when it show after clicked

蹲街弑〆低调 提交于 2020-03-12 06:45:41
问题 I want to scrape phone no but phone no only displays after clicked so please is it possible to scrape phone no directly using python?My code scrape phone no but with starr***. here is the link from where I want to scrape phone no:https://hipages.com.au/connect/abcelectricservicespl/service/126298 please guide me! here is my code: import requests from bs4 import BeautifulSoup def get_page(url): response = requests.get(url) if not response.ok: print('server responded:', response.status_code)

Beatifulsoup: how to get image size by url

ε祈祈猫儿з 提交于 2020-03-04 17:39:52
问题 I need to get width and height if the extracted image url, i used get('width') , but this seems not working description = soup.find("div", id="module_product_detail") img= description.find("img") print(img.get('width')) The output is none. link looks like this <img alt="image" src="https://bos1.lightake.net:20011/UploadFiles/ShopSkus/1000x1000/Y2463/Y246302/sku_Y246302_1.jpg"/> 回答1: Since there's not width nor height attribute, the only way to access the METADATA of the image is by

Can't scrape names from next pages using requests

你离开我真会死。 提交于 2020-03-03 11:48:52
问题 I'm trying to parse names traversing multiple pages from a webpage using a python script. With my current attempt I can get the names from it's landing page. However, I can't find any idea to fetch the names from next pages as well using requests and BeautifulSoup. website link My attempt so far: import requests from bs4 import BeautifulSoup url = "https://proximity.niceic.com/mainform.aspx?PostCode=YO95" with requests.Session() as s: r = s.get(url) soup = BeautifulSoup(r.text,"lxml") for