beautifulsoup | 易学教程

Find partial class names in spans with Beautiful Soup

阅读更多关于 Find partial class names in spans with Beautiful Soup

问题 This page https://www.kijiji.ca/v-1-bedroom-apartments-condos/ville-de-montreal/1-chambre-chauff-eau-chaude-incl-vsl-514-856-0038/1334431659 contains this span class: <span class="currentPrice-3131760660"><span content="800.00">800,00 $</span> I'm trying to automatically extract the price (800$ in this case). Over time, however, the number after "currentPrice-" changes, and my Python script ceases to work. I am using this Beautiful soup function: soup.find_all('span', {'class' : 'currentPrice

Response object doesn't return the data I want to scrape from a URL

阅读更多关于 Response object doesn't return the data I want to scrape from a URL

问题 I am trying to scrape the titles, description, partners etc from this search result using requests and BeautifulSoup in Python. But the response object doesn't return the data which I need and which is shown when I visit the URL in the browser Here is what I have so far: import requests from bs4 import BeautifulSoup as bs url = 'https://partneredge.sap.com/content/partnerfinder/search.html#/search/results?itemsPerPage=10&sortBy=shortname&sortOrder=asc' count = 0 response = requests.get(url)

BeatifulSoup findAll is returning an empty array (python)

阅读更多关于 BeatifulSoup findAll is returning an empty array (python)

问题 I am trying to get data from this web page https://playruneterra.com/es-es/news and the part I am trying to get is this: I am using BeatufulSoup to get the html and search in it but when I used the findAll method to get that line, it returns me an empty array. I tried the same in other pages and it works fine. What is happening? This is my code: This is an example that is working: Thanks all. 回答1: You can use the PyQt to build a headless browser and then scrapp the data from the website. Here

Creating new columns by scraping information

阅读更多关于 Creating new columns by scraping information

问题 I am trying to add information scraped from a website into columns. I have a dataset that looks like: COL1 COL2 COL3 ... ... bbc.co.uk and I would like to have a dataset which includes new columns: COL1 COL2 COL3 Website Address Last Analysis Blacklist Status \ ... ... bbc.co.uk IP Address Server Location City Region These new columns come from the this website: https://www.urlvoid.com/scan/bbc.co.uk. I would need to fill each column with its related information. For example: COL1 COL2 COL3

Creating new columns by scraping information

阅读更多关于 Creating new columns by scraping information

How to select a specific row from a table using BeautifulSoup?

阅读更多关于 How to select a specific row from a table using BeautifulSoup?

问题 So I have a question related to a previous question, but I realized I needed to go one level more to get an 11-digit NDC code instead of a 10-digit NDC code. Rather than convert them later, I thought I could just grab them initially. Here is the link to the previous question. Is there a way to parse data from multiple pages from a parent webpage? And what I want to do is to click on the links here (which is the 2nd level btw) And then grab the 11-digit NDC codes that results on the following

Unable to scrape the name from the inner page of each result using requests

阅读更多关于 Unable to scrape the name from the inner page of each result using requests

问题 I've created a script in python making use of post http requests to get the search results from a webpage. To populate the results, it is necessary to click on the fields sequentially shown here. Now a new page will be there and this is how to populate the result. There are ten results in the first page and the following script can parse the results flawlessly. What I wish to do now is use the results to reach their inner page in order to parse Sole Proprietorship Name (English) from there.

Append markup string to a tag in BeautifulSoup

阅读更多关于 Append markup string to a tag in BeautifulSoup

问题 Is it possible to set markup as tag content (akin to setting innerHtml in JavaScript)? For the sake of example, let's say I want to add 10 <a> elements to a <div> , but have them separated with a comma: soup = BeautifulSoup(<<some document here>>) a_tags = ["<a>1</a>", "<a>2</a>", ...] # list of strings div = soup.new_tag("div") a_str = ",".join(a_tags) Using div.append(a_str) escapes < and > into < and > , so I end up with <div> <a1> 1 </a> ... </div> BeautifulSoup(a_str) wraps this in <html

scraping chinese characters python

阅读更多关于 scraping chinese characters python

问题 I learnt how to scrap website from https://automatetheboringstuff.com. I wanted to scrap http://www.piaotian.net/html/3/3028/1473227.html in which the contents is in chinese and write its contents into a .txt file. However, the .txt file contains random symbols which I assume is a encoding/decoding problem. I've read this thread "how to decode and encode web page with python?" and figured the encoding method for my site is "gb2312" and "windows-1252". I tried decoding in those two encoding

Change the text of the inner tag using beautifulsoup python

阅读更多关于 Change the text of the inner tag using beautifulsoup python

问题 I would like to change the inner text of a tag in HTML obtained using Beautifulsoup . Example: <a href="index.html" id="websiteName">Foo</a> turns into: <a href="index.html" id="websiteName">Bar</a> I have managed to get the tag by it's id by: HTMLDocument.find(id='websiteName') But I'm not beeing able to change the inner text of the tag: print HTMLDocument.find(id='websiteName') a = HTMLDocument.find(id='websiteName') a = a.replaceWith('<a href="index.html" id="websiteName">Bar</a>') // I