screen-scraping | 易学教程

问题 I've been trying to scrape YouTube links from a webpage, but nothing has worked. This is a picture of what I've been trying to scrape.: This is the code I tried most recently: youtube_link = soup.find("a", class_="ytp-title-link yt-uix-sessionlink") And this is the link to the website the YouTube link is in: https://www.electronic-festivals.com/event/i-am-hardstyle-germany 回答1: Most of the youtube links are within an iframe and javascript also needs to run. Try using selenium. The following

How to scrape charts from a website with python?

阅读更多关于 How to scrape charts from a website with python?

问题 EDIT: So I have save the script codes below to a text file but using re to extract the data still doesn't return me anything. My code is: file_object = open('source_test_script.txt', mode="r") soup = BeautifulSoup(file_object, "html.parser") pattern = re.compile(r"^var (chart[0-9]+) = new Highcharts.Chart$({.*?})$;$", re.MULTILINE | re.DOTALL) scripts = soup.find("script", text=pattern) profile_text = pattern.search(scripts.text).group(1) profile = json.loads(profile_text) print profile[

How to scrape charts from a website with python?

阅读更多关于 How to scrape charts from a website with python?

scrapy: request url must be str or unicode, got Selector

阅读更多关于 scrapy: request url must be str or unicode, got Selector

问题 I am writing a spider using Scrapy, to scrape user details of Pinterest. I am trying to get the details of user and his followers ( and so on until the last node). Below is the spider code: from scrapy.spider import BaseSpider import scrapy from pinners.items import PinterestItem from scrapy.http import FormRequest from urlparse import urlparse class Sample(BaseSpider): name = 'sample' allowed_domains = ['pinterest.com'] start_urls = ['https://www.pinterest.com/banka/followers', ] def parse

Use Beatiful Soup in scraping multiple websites

阅读更多关于 Use Beatiful Soup in scraping multiple websites

问题 I want to know why lists all_links and all_titles don't want to receive any records from lists titles and links . I have tried also .extend() method and it didn't help. import requests from bs4 import BeautifulSoup all_links = [] all_titles = [] def title_link(page_num): page = requests.get( 'https://www.gumtree.pl/s-mieszkania-i-domy-sprzedam-i-kupie/warszawa/page-%d/v%dc9073l3200008p%d' % (page_num, page_num, page_num)) soup = BeautifulSoup(page.content, 'html.parser') links = ['https://www