beautifulsoup

How to separate columns and format date when web scraping by using Python?

送分小仙女□ 提交于 2020-06-28 03:58:11
问题 I am trying to web scrape, by using Python 3, a chart off of this website into a .csv file: 2013-14 NBA National TV Schedule The chart starts out like: Game/Time Network Matchup Oct. 29, 8 p.m. ET TNT Chicago vs. Miami Oct. 29, 10:30 p.m. ET TNT LA Clippers vs. LA Lakers I am using these packages: import re import requests import pandas as pd from bs4 import BeautifulSoup from itertools import groupby I imported the data by: pd.read_html("https://www.sbnation.com/2013/8/6/4595688/2013-14-nba

Scraping text from Kickstarter projects return nothing

[亡魂溺海] 提交于 2020-06-28 03:55:27
问题 I am trying to scrape the main text of a project from the Kickstarter project webpage. I have the following code which works for the first URL but does not work for the second and third URL. I was wondering if there is an easy fix to my code without the need to use other packages? url = "https://www.kickstarter.com/projects/1365297844/kuhkubus-3d-escher-figures?ref=discovery_staff_picks_category_newest" #url = "https://www.kickstarter.com/projects/clarissaredwine/swingby-a-voyager-gravity

Scraping table data from multiple links and combine this together in one excel file

妖精的绣舞 提交于 2020-06-28 03:46:12
问题 I have a link, and within that link, I have some products. Within each of these products, there is a table of specifications. The table is such that first column should be the header, and second column the data corresponding to it. The first column for each of these tables is different, with some overlapping categories. I want to get one big table that has all these categories, and in rows, the different products. I am able to get data for one table (one product) as follows: import requests

scrape YouTube video from a specific channel and search?

柔情痞子 提交于 2020-06-27 12:11:14
问题 I am using this code to get the url of a youtube channel it works fine, but I would like to add an option to search for a video with a specific title within the channel. and get the url of the first video you find with the search phrase from bs4 import BeautifulSoup import requests url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips" html = requests.get(url) soup = BeautifulSoup(html.text, "lxml") for entry in soup.find_all("entry"): for link in entry.find_all("link"): print(link

scrape YouTube video from a specific channel and search?

痞子三分冷 提交于 2020-06-27 12:10:34
问题 I am using this code to get the url of a youtube channel it works fine, but I would like to add an option to search for a video with a specific title within the channel. and get the url of the first video you find with the search phrase from bs4 import BeautifulSoup import requests url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips" html = requests.get(url) soup = BeautifulSoup(html.text, "lxml") for entry in soup.find_all("entry"): for link in entry.find_all("link"): print(link

Web scraping Google search results [closed]

泄露秘密 提交于 2020-06-27 06:00:06
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 months ago . Improve this question I am web scraping Google Scholar search results page by page. After a certain number of pages, a captcha pops up and interrupts my code. I read that Google limits the requests that I can make per hour. Is there any way around this limit? I read something

How to simulate a button click in a request?

僤鯓⒐⒋嵵緔 提交于 2020-06-27 04:41:48
问题 Please do not close this question - this is not a duplicate. I need to click the button using Python requests, not Selenium, as here I am trying to scrape Reverso Context translation examples page. And I have a problem: I can get only 20 examples and then I need to click the "Display more examples" button lots of times while it exists on the page to get the full results list. It can simply be done using a web browser, but how can I do it with Python Requests library? I looked at the button's

Beautiful Soup not waiting until page is fully loaded

末鹿安然 提交于 2020-06-27 04:14:47
问题 So with my code below I want to open an apartment website URL and scrape the webpage. The only issue is that Beautiful Soup isn't waiting until the entire webpage is rendered. The apartments aren't rendered in the html until they are loaded on the page, which takes a few seconds. How do I fix this? from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = 'https://xxxxx.com/properties/?sort=latest' uClient = uReq(my_url) page_html = uClient.read() uClient.close

Beautiful Soup not waiting until page is fully loaded

 ̄綄美尐妖づ 提交于 2020-06-27 04:14:09
问题 So with my code below I want to open an apartment website URL and scrape the webpage. The only issue is that Beautiful Soup isn't waiting until the entire webpage is rendered. The apartments aren't rendered in the html until they are loaded on the page, which takes a few seconds. How do I fix this? from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = 'https://xxxxx.com/properties/?sort=latest' uClient = uReq(my_url) page_html = uClient.read() uClient.close

web-scraping with python 3.6 and beautifulsoup - getting Invalid URL

*爱你&永不变心* 提交于 2020-06-26 05:54:22
问题 I want to work with this page in Python: http://www.sothebys.com/en/search-results.html?keyword=degas%27 This is my code: from bs4 import BeautifulSoup import requests page = requests.get('http://www.sothebys.com/en/search-results.html?keyword=degas%27') soup = BeautifulSoup(page.content, "lxml") print(soup) I'm getting following output: <html><head> <title>Invalid URL</title> </head><body> <h1>Invalid URL</h1> The requested URL "[no URL]", is invalid.<p> Reference #9.8f4f1502.1494363829