beautifulsoup | 易学教程

BeautifulSoup does not read 'full' HTML obtained by requests

阅读更多关于 BeautifulSoup does not read 'full' HTML obtained by requests

问题 I am trying to scrape URL's from a website presented as HTML using the BeautifulSoup and requests libraries. I am running both of them on Python 3.5. It seems I am succesfully getting the HTML from requests because when I display r.content, the full HTML of the website I am trying to scrape is displayed. However, when I pass this to BeautifulSoup, BeautifulSoup drops the bulk of the HTML, including the URL I am trying to scrape. from bs4 import BeautifulSoup import requests page = requests

BeautifulSoup does not read 'full' HTML obtained by requests

阅读更多关于 BeautifulSoup does not read 'full' HTML obtained by requests

Scraping facebook likes, comments and shares with Beautiful Soup

阅读更多关于 Scraping facebook likes, comments and shares with Beautiful Soup

问题 I want to scrape number of likes, comments and shares with Beautiful soup and Python. I have wrote a code, but it returns me the empty list, I do not know why: this is the code: from bs4 import BeautifulSoup import requests website = "https://www.facebook.com/nike" soup = requests.get(website).text my_html = BeautifulSoup(soup, 'lxml') list_of_likes = my_html.find_all('span', class_='_81hb') print(list_of_likes) for i in list_of_likes: print(i) The same is with comments and likes. What should

Scraping facebook likes, comments and shares with Beautiful Soup

阅读更多关于 Scraping facebook likes, comments and shares with Beautiful Soup

Decode a web page using request and BeautifulSoup package

阅读更多关于 Decode a web page using request and BeautifulSoup package

问题 I am trying a practice question of python. The question is "Use the BeautifulSoup and requests Python packages to print out a list of all the article titles on the New York Times homepage." Below is my solution but it doesn't give any output. I am using Jupyter Notebook and when I run the below code it does nothing. My kernel is also working properly which means I have a problem with my code. import requests from bs4 import BeautifulSoup from urllib.request import urlopen base_url= 'https:/

How to retrieve the list of values from a drop down list

阅读更多关于 How to retrieve the list of values from a drop down list

问题 I am trying to retrieve the list of available option expiries for a given ticker on yahoo finance. For instance using SPY as ticker on https://finance.yahoo.com/quote/SPY/options The list of expiries are in the drop down list: <div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4"> <select class="Fz(s)" data-reactid="5"> <option selected="" value="1576627200" data-reactid="6">December 18, 2019</option> <option value="1576800000" data-reactid="7">December

How to retrieve the list of values from a drop down list

阅读更多关于 How to retrieve the list of values from a drop down list

Paginate with network requests scraper

阅读更多关于 Paginate with network requests scraper

问题 I am trying to scrape Naukri job postings. Web scraping was too time-consuming, so I switched to network requests. I believe I got the request pattern for pagination by changing the URL right (not clicking the next tab). URLs Example: https://www.naukri.com/maintenance-jobs?xt=catsrch&qf%5B%5D=19 https://www.naukri.com/maintenance-jobs-2?xt=catsrch&qf%5B%5D=19 https://www.naukri.com/maintenance-jobs-3?xt=catsrch&qf%5B%5D=19 https://www.naukri.com/maintenance-jobs-4?xt=catsrch&qf%5B%5D=19 The

Paginate with network requests scraper

阅读更多关于 Paginate with network requests scraper

Find data within HTML tags using Python

阅读更多关于 Find data within HTML tags using Python

问题 I have the following HTML code I am trying to scrape from a website: <td>Net Taxes Due<td> <td class="value-column">$2,370.00</td> <td class="value-column">$2,408.00</td> What I am trying to accomplish is to search the page to find the text "Net Taxes Due" within the tag, find the siblings of the tag, and send the results into a Pandas data frame. I have the following code: soup = BeautifulSoup(url, "html.parser") table = soup.select('#Net Taxes Due') cells = table.find_next_siblings('td')