beautifulsoup | 易学教程

How to separate columns and format date when web scraping by using Python?

阅读更多关于 How to separate columns and format date when web scraping by using Python?

问题 I am trying to web scrape, by using Python 3, a chart off of this website into a .csv file: 2013-14 NBA National TV Schedule The chart starts out like: Game/Time Network Matchup Oct. 29, 8 p.m. ET TNT Chicago vs. Miami Oct. 29, 10:30 p.m. ET TNT LA Clippers vs. LA Lakers I am using these packages: import re import requests import pandas as pd from bs4 import BeautifulSoup from itertools import groupby I imported the data by: pd.read_html("https://www.sbnation.com/2013/8/6/4595688/2013-14-nba

Scraping text from Kickstarter projects return nothing

阅读更多关于 Scraping text from Kickstarter projects return nothing

问题 I am trying to scrape the main text of a project from the Kickstarter project webpage. I have the following code which works for the first URL but does not work for the second and third URL. I was wondering if there is an easy fix to my code without the need to use other packages? url = "https://www.kickstarter.com/projects/1365297844/kuhkubus-3d-escher-figures?ref=discovery_staff_picks_category_newest" #url = "https://www.kickstarter.com/projects/clarissaredwine/swingby-a-voyager-gravity

Scraping table data from multiple links and combine this together in one excel file

阅读更多关于 Scraping table data from multiple links and combine this together in one excel file

问题 I have a link, and within that link, I have some products. Within each of these products, there is a table of specifications. The table is such that first column should be the header, and second column the data corresponding to it. The first column for each of these tables is different, with some overlapping categories. I want to get one big table that has all these categories, and in rows, the different products. I am able to get data for one table (one product) as follows: import requests

scrape YouTube video from a specific channel and search?

阅读更多关于 scrape YouTube video from a specific channel and search?

问题 I am using this code to get the url of a youtube channel it works fine, but I would like to add an option to search for a video with a specific title within the channel. and get the url of the first video you find with the search phrase from bs4 import BeautifulSoup import requests url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips" html = requests.get(url) soup = BeautifulSoup(html.text, "lxml") for entry in soup.find_all("entry"): for link in entry.find_all("link"): print(link

scrape YouTube video from a specific channel and search?

阅读更多关于 scrape YouTube video from a specific channel and search?

Web scraping Google search results [closed]

阅读更多关于 Web scraping Google search results [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 months ago . Improve this question I am web scraping Google Scholar search results page by page. After a certain number of pages, a captcha pops up and interrupts my code. I read that Google limits the requests that I can make per hour. Is there any way around this limit? I read something

How to simulate a button click in a request?

阅读更多关于 How to simulate a button click in a request?

问题 Please do not close this question - this is not a duplicate. I need to click the button using Python requests, not Selenium, as here I am trying to scrape Reverso Context translation examples page. And I have a problem: I can get only 20 examples and then I need to click the "Display more examples" button lots of times while it exists on the page to get the full results list. It can simply be done using a web browser, but how can I do it with Python Requests library? I looked at the button's

Beautiful Soup not waiting until page is fully loaded

阅读更多关于 Beautiful Soup not waiting until page is fully loaded

问题 So with my code below I want to open an apartment website URL and scrape the webpage. The only issue is that Beautiful Soup isn't waiting until the entire webpage is rendered. The apartments aren't rendered in the html until they are loaded on the page, which takes a few seconds. How do I fix this? from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = 'https://xxxxx.com/properties/?sort=latest' uClient = uReq(my_url) page_html = uClient.read() uClient.close

Beautiful Soup not waiting until page is fully loaded

阅读更多关于 Beautiful Soup not waiting until page is fully loaded

web-scraping with python 3.6 and beautifulsoup - getting Invalid URL

阅读更多关于 web-scraping with python 3.6 and beautifulsoup - getting Invalid URL

问题 I want to work with this page in Python: http://www.sothebys.com/en/search-results.html?keyword=degas%27 This is my code: from bs4 import BeautifulSoup import requests page = requests.get('http://www.sothebys.com/en/search-results.html?keyword=degas%27') soup = BeautifulSoup(page.content, "lxml") print(soup) I'm getting following output: <html><head> <title>Invalid URL</title> </head><body> <h1>Invalid URL</h1> The requested URL "[no URL]", is invalid.<p> Reference #9.8f4f1502.1494363829