beautifulsoup

how to fetch data from javascript loaded site using beautifulsoup

拥有回忆 提交于 2021-01-28 08:02:41
问题 I am trying to fetch some data from this website https://www.walmart.com/store/2141-philadelphia-pa/search?query=ice%20cream I have been using this method to fetch javascript loaded sites def getLocalStoreProducts(): session = requests.Session() localStoreUrl = 'https://www.walmart.com/store/2141-philadelphia-pa/search?query=' searchWord = "ice cream" searchWord1 = checkForSpace(searchWord) wordUrl = localStoreUrl+searchWord1 print(wordUrl) # try: categorySoup = BeautifulSoup(session.get

I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

隐身守侯 提交于 2021-01-28 07:48:06
问题 I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters. how can I solve the issue? start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project

Extract link from url using Beautifulsoup

杀马特。学长 韩版系。学妹 提交于 2021-01-28 07:31:24
问题 I am trying to get the web link of the following, using beautifulsoup <div class="alignright single"> <a href="http://www.dailyhadithonline.com/2013/07/21/hadith-on-clothing-women-should-lower-their-garments-to-cover-their-feet/" rel="next">Hadith on Clothing: Women should lower their garments to cover their feet</a> » </div> </div> My code is as follow from bs4 import BeautifulSoup import urllib2 url1 = "http://www.dailyhadithonline.com/2013/07/21/hadith-on-clothing-the-lower-garment-should

Extract link from url using Beautifulsoup

坚强是说给别人听的谎言 提交于 2021-01-28 07:18:05
问题 I am trying to get the web link of the following, using beautifulsoup <div class="alignright single"> <a href="http://www.dailyhadithonline.com/2013/07/21/hadith-on-clothing-women-should-lower-their-garments-to-cover-their-feet/" rel="next">Hadith on Clothing: Women should lower their garments to cover their feet</a> » </div> </div> My code is as follow from bs4 import BeautifulSoup import urllib2 url1 = "http://www.dailyhadithonline.com/2013/07/21/hadith-on-clothing-the-lower-garment-should

Loop pages and save contents in Excel file from website in Python

…衆ロ難τιáo~ 提交于 2021-01-28 06:14:27
问题 I'm trying to loop pages from this link and extract the interesting part. Please see the contents in the red circle in the image below. Here's what I've tried: url = 'http://so.eastmoney.com/Ann/s?keyword=购买物业&pageindex={}' for page in range(10): r = requests.get(url.format(page)) soup = BeautifulSoup(r.content, "html.parser") print(soup) xpath for each element (might be helpful for those that don't read Chinese): /html/body/div[3]/div/div[2]/div[2]/div[3]/h3/span --> 【润华物业】 /html/body/div[3]

Multiple conditions in BeautifulSoup: Text=True & IMG Alt=True

萝らか妹 提交于 2021-01-28 05:49:30
问题 is there a way to use multiple conditions in BeautifulSoup? These are the two conditions I like to use together: Get text: soup.find_all(text=True) Get img alt: soup.find_all('img', title=True): I know I can do it separately but I would like to get it together to keep the flow of the HTML. The reason I'm doing this is because only BeautifulSoup extract the hidden text by css: Display None. When you use driver.find_element_by_tag_name('body').text you get the img alt att, but unfortunately not

Discord does not embed link when sent by my bot

旧时模样 提交于 2021-01-28 05:36:51
问题 My code works fine and the bot sends the link, but Discord does not recognize it as one and does not embed it. When I copy and paste it myself, it then recognizes it as a link and embed the image. Here is my code: import requests from bs4 import BeautifulSoup if message.content.startswith(".dog"): response = requests.get("https://dog.ceo/api/breeds/image/random") soupRaw = BeautifulSoup(response.text, 'lxml') soupBackend = str(soupRaw).split("message") soup2 = soupBackend[1] soup3 = soup2[3:]

Get value of attribute using CSS Selectors with BeutifulSoup

喜你入骨 提交于 2021-01-28 03:54:34
问题 I am web-scraping with Python and using BeutifulSoup library I have HTML markup like this: <tr class="deals" data-url="www.example2.com"> <span class="hotel-name"> <a href="www.example2.com"></a> </span> </tr> <tr class="deals" data-url="www.example3.com"> <span class="hotel-name"> <a href="www.example3.com"></a> </span> </tr> I want to get the data-url or the href value in all <tr> s. Better If I can get href value Here is a little snippet of my relevant code: main_url = "http://localhost

Maintaining the indentation of an XML file when parsed with Beautifulsoup

こ雲淡風輕ζ 提交于 2021-01-28 03:32:30
问题 I am using BS4 to parse an XML file and trying to write it back to a new XML file. Input file: <tag1> <tag2 attr1="a1"> example text </tag2> <tag3> <tag4 attr2="a2"> example text </tag4> <tag5> <tag6 attr3="a3"> example text </tag6> </tag5> </tag3> </tag1> Script: soup = BeautifulSoup(open("input.xml"), "xml") f = open("output.xml", "w") f.write(soup.encode(formatter='minimal')) f.close() Output: <tag1> <tag2 attr1="a1"> example text </tag2> <tag3> <tag4 attr2="a2"> example text </tag4> <tag5

Beautiful Soup returns None on existing element

纵然是瞬间 提交于 2021-01-28 02:21:41
问题 I'm trying to scrape the price of a product. Here's my code: from bs4 import BeautifulSoup as soup import requests page_url = "https://www.falabella.com/falabella-cl/product/5311682/Smartphone-iPhone-7-PLUS-32GB/5311682/" headers={ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' } uClient = requests.get(page_url, headers=headers) print(uClient) page_soup = soup(uClient.content, "html.parser") #requests