beautifulsoup

BeautifulSoup does not read 'full' HTML obtained by requests

只谈情不闲聊 提交于 2021-02-11 02:54:25
问题 I am trying to scrape URL's from a website presented as HTML using the BeautifulSoup and requests libraries. I am running both of them on Python 3.5. It seems I am succesfully getting the HTML from requests because when I display r.content, the full HTML of the website I am trying to scrape is displayed. However, when I pass this to BeautifulSoup, BeautifulSoup drops the bulk of the HTML, including the URL I am trying to scrape. from bs4 import BeautifulSoup import requests page = requests

BeautifulSoup does not read 'full' HTML obtained by requests

拜拜、爱过 提交于 2021-02-11 02:45:42
问题 I am trying to scrape URL's from a website presented as HTML using the BeautifulSoup and requests libraries. I am running both of them on Python 3.5. It seems I am succesfully getting the HTML from requests because when I display r.content, the full HTML of the website I am trying to scrape is displayed. However, when I pass this to BeautifulSoup, BeautifulSoup drops the bulk of the HTML, including the URL I am trying to scrape. from bs4 import BeautifulSoup import requests page = requests

Scraping facebook likes, comments and shares with Beautiful Soup

半世苍凉 提交于 2021-02-10 20:38:45
问题 I want to scrape number of likes, comments and shares with Beautiful soup and Python. I have wrote a code, but it returns me the empty list, I do not know why: this is the code: from bs4 import BeautifulSoup import requests website = "https://www.facebook.com/nike" soup = requests.get(website).text my_html = BeautifulSoup(soup, 'lxml') list_of_likes = my_html.find_all('span', class_='_81hb') print(list_of_likes) for i in list_of_likes: print(i) The same is with comments and likes. What should

Scraping facebook likes, comments and shares with Beautiful Soup

人走茶凉 提交于 2021-02-10 20:34:04
问题 I want to scrape number of likes, comments and shares with Beautiful soup and Python. I have wrote a code, but it returns me the empty list, I do not know why: this is the code: from bs4 import BeautifulSoup import requests website = "https://www.facebook.com/nike" soup = requests.get(website).text my_html = BeautifulSoup(soup, 'lxml') list_of_likes = my_html.find_all('span', class_='_81hb') print(list_of_likes) for i in list_of_likes: print(i) The same is with comments and likes. What should

Decode a web page using request and BeautifulSoup package

不问归期 提交于 2021-02-10 20:20:55
问题 I am trying a practice question of python. The question is "Use the BeautifulSoup and requests Python packages to print out a list of all the article titles on the New York Times homepage." Below is my solution but it doesn't give any output. I am using Jupyter Notebook and when I run the below code it does nothing. My kernel is also working properly which means I have a problem with my code. import requests from bs4 import BeautifulSoup from urllib.request import urlopen base_url= 'https:/

How to retrieve the list of values from a drop down list

半腔热情 提交于 2021-02-10 19:55:32
问题 I am trying to retrieve the list of available option expiries for a given ticker on yahoo finance. For instance using SPY as ticker on https://finance.yahoo.com/quote/SPY/options The list of expiries are in the drop down list: <div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4"> <select class="Fz(s)" data-reactid="5"> <option selected="" value="1576627200" data-reactid="6">December 18, 2019</option> <option value="1576800000" data-reactid="7">December

How to retrieve the list of values from a drop down list

拜拜、爱过 提交于 2021-02-10 19:54:41
问题 I am trying to retrieve the list of available option expiries for a given ticker on yahoo finance. For instance using SPY as ticker on https://finance.yahoo.com/quote/SPY/options The list of expiries are in the drop down list: <div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4"> <select class="Fz(s)" data-reactid="5"> <option selected="" value="1576627200" data-reactid="6">December 18, 2019</option> <option value="1576800000" data-reactid="7">December

Paginate with network requests scraper

跟風遠走 提交于 2021-02-10 19:05:03
问题 I am trying to scrape Naukri job postings. Web scraping was too time-consuming, so I switched to network requests. I believe I got the request pattern for pagination by changing the URL right (not clicking the next tab). URLs Example: https://www.naukri.com/maintenance-jobs?xt=catsrch&qf%5B%5D=19 https://www.naukri.com/maintenance-jobs-2?xt=catsrch&qf%5B%5D=19 https://www.naukri.com/maintenance-jobs-3?xt=catsrch&qf%5B%5D=19 https://www.naukri.com/maintenance-jobs-4?xt=catsrch&qf%5B%5D=19 The

Paginate with network requests scraper

会有一股神秘感。 提交于 2021-02-10 19:01:26
问题 I am trying to scrape Naukri job postings. Web scraping was too time-consuming, so I switched to network requests. I believe I got the request pattern for pagination by changing the URL right (not clicking the next tab). URLs Example: https://www.naukri.com/maintenance-jobs?xt=catsrch&qf%5B%5D=19 https://www.naukri.com/maintenance-jobs-2?xt=catsrch&qf%5B%5D=19 https://www.naukri.com/maintenance-jobs-3?xt=catsrch&qf%5B%5D=19 https://www.naukri.com/maintenance-jobs-4?xt=catsrch&qf%5B%5D=19 The

Find data within HTML tags using Python

自闭症网瘾萝莉.ら 提交于 2021-02-10 18:44:29
问题 I have the following HTML code I am trying to scrape from a website: <td>Net Taxes Due<td> <td class="value-column">$2,370.00</td> <td class="value-column">$2,408.00</td> What I am trying to accomplish is to search the page to find the text "Net Taxes Due" within the tag, find the siblings of the tag, and send the results into a Pandas data frame. I have the following code: soup = BeautifulSoup(url, "html.parser") table = soup.select('#Net Taxes Due') cells = table.find_next_siblings('td')