beautifulsoup

Find partial class names in spans with Beautiful Soup

梦想与她 提交于 2020-05-17 07:30:27
问题 This page https://www.kijiji.ca/v-1-bedroom-apartments-condos/ville-de-montreal/1-chambre-chauff-eau-chaude-incl-vsl-514-856-0038/1334431659 contains this span class: <span class="currentPrice-3131760660"><span content="800.00">800,00 $</span> I'm trying to automatically extract the price (800$ in this case). Over time, however, the number after "currentPrice-" changes, and my Python script ceases to work. I am using this Beautiful soup function: soup.find_all('span', {'class' : 'currentPrice

Response object doesn't return the data I want to scrape from a URL

放肆的年华 提交于 2020-05-17 06:56:08
问题 I am trying to scrape the titles, description, partners etc from this search result using requests and BeautifulSoup in Python. But the response object doesn't return the data which I need and which is shown when I visit the URL in the browser Here is what I have so far: import requests from bs4 import BeautifulSoup as bs url = 'https://partneredge.sap.com/content/partnerfinder/search.html#/search/results?itemsPerPage=10&sortBy=shortname&sortOrder=asc' count = 0 response = requests.get(url)

BeatifulSoup findAll is returning an empty array (python)

人盡茶涼 提交于 2020-05-17 05:47:58
问题 I am trying to get data from this web page https://playruneterra.com/es-es/news and the part I am trying to get is this: I am using BeatufulSoup to get the html and search in it but when I used the findAll method to get that line, it returns me an empty array. I tried the same in other pages and it works fine. What is happening? This is my code: This is an example that is working: Thanks all. 回答1: You can use the PyQt to build a headless browser and then scrapp the data from the website. Here

Creating new columns by scraping information

大憨熊 提交于 2020-05-16 02:22:17
问题 I am trying to add information scraped from a website into columns. I have a dataset that looks like: COL1 COL2 COL3 ... ... bbc.co.uk and I would like to have a dataset which includes new columns: COL1 COL2 COL3 Website Address Last Analysis Blacklist Status \ ... ... bbc.co.uk IP Address Server Location City Region These new columns come from the this website: https://www.urlvoid.com/scan/bbc.co.uk. I would need to fill each column with its related information. For example: COL1 COL2 COL3

Creating new columns by scraping information

橙三吉。 提交于 2020-05-16 02:21:23
问题 I am trying to add information scraped from a website into columns. I have a dataset that looks like: COL1 COL2 COL3 ... ... bbc.co.uk and I would like to have a dataset which includes new columns: COL1 COL2 COL3 Website Address Last Analysis Blacklist Status \ ... ... bbc.co.uk IP Address Server Location City Region These new columns come from the this website: https://www.urlvoid.com/scan/bbc.co.uk. I would need to fill each column with its related information. For example: COL1 COL2 COL3

How to select a specific row from a table using BeautifulSoup?

我是研究僧i 提交于 2020-05-15 09:31:09
问题 So I have a question related to a previous question, but I realized I needed to go one level more to get an 11-digit NDC code instead of a 10-digit NDC code. Rather than convert them later, I thought I could just grab them initially. Here is the link to the previous question. Is there a way to parse data from multiple pages from a parent webpage? And what I want to do is to click on the links here (which is the 2nd level btw) And then grab the 11-digit NDC codes that results on the following

Unable to scrape the name from the inner page of each result using requests

ぃ、小莉子 提交于 2020-05-15 08:10:42
问题 I've created a script in python making use of post http requests to get the search results from a webpage. To populate the results, it is necessary to click on the fields sequentially shown here. Now a new page will be there and this is how to populate the result. There are ten results in the first page and the following script can parse the results flawlessly. What I wish to do now is use the results to reach their inner page in order to parse Sole Proprietorship Name (English) from there.

Append markup string to a tag in BeautifulSoup

孤者浪人 提交于 2020-05-15 03:51:29
问题 Is it possible to set markup as tag content (akin to setting innerHtml in JavaScript)? For the sake of example, let's say I want to add 10 <a> elements to a <div> , but have them separated with a comma: soup = BeautifulSoup(<<some document here>>) a_tags = ["<a>1</a>", "<a>2</a>", ...] # list of strings div = soup.new_tag("div") a_str = ",".join(a_tags) Using div.append(a_str) escapes < and > into < and > , so I end up with <div> <a1> 1 </a> ... </div> BeautifulSoup(a_str) wraps this in <html

scraping chinese characters python

血红的双手。 提交于 2020-05-14 10:18:20
问题 I learnt how to scrap website from https://automatetheboringstuff.com. I wanted to scrap http://www.piaotian.net/html/3/3028/1473227.html in which the contents is in chinese and write its contents into a .txt file. However, the .txt file contains random symbols which I assume is a encoding/decoding problem. I've read this thread "how to decode and encode web page with python?" and figured the encoding method for my site is "gb2312" and "windows-1252". I tried decoding in those two encoding

Change the text of the inner tag using beautifulsoup python

牧云@^-^@ 提交于 2020-05-14 08:43:53
问题 I would like to change the inner text of a tag in HTML obtained using Beautifulsoup . Example: <a href="index.html" id="websiteName">Foo</a> turns into: <a href="index.html" id="websiteName">Bar</a> I have managed to get the tag by it's id by: HTMLDocument.find(id='websiteName') But I'm not beeing able to change the inner text of the tag: print HTMLDocument.find(id='websiteName') a = HTMLDocument.find(id='websiteName') a = a.replaceWith('<a href="index.html" id="websiteName">Bar</a>') // I