How can I get the first string from a div that has a div embedded beautifulsoup4

巧了我就是萌 提交于 2020-02-02 13:02:31

问题


I'm trying to extract prices from a website.

The code I've written can do that, but when the website has a price that also shows the old price, it returns "none" instead of a string of the price.

This is an example of the code without the old price (which my code returns as a string)

<div class="xl-price rangePrice">
                            535.000 €  
                        </div>

This is an example of the code WITH the old price (which my code returns as "none")

    < div


class ="xl-price rangePrice" >


487.000 €
< span


class ="old-price" > 497.000 € < br > < / span >

< / div >

The page I'm trying to extract code from: pagelink

My code:

prices = []
for items in soup.find_all("div", {"class": "xl-price rangePrice"}):
    prices.append(items.string)

print(prices)

and another issue I'm having is that it returns the values like this:

'\r\n\t\t\t\t\t\t\t\t298.000 € \r\n\t\t\t\t\t\t\t', '\r\n\t\t\t\t\t\t\t\t145.000 € \r\n\t\t\t\t\t\t\t'

when I only want the numbers.

Would appreciate the help!


回答1:


import requests
from bs4 import BeautifulSoup

r = requests.get(
    'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000')
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll('div', attrs={'class': 'xl-price rangePrice'}):
    item = item.contents[0]
    print(item.strip()[0:-1])

Output:

298.000 
145.000 
275.000 
535.000 
487.000 
159.000 
325.000 
189.000 
139.000 
499.000 
520.000 
249.500 
448.000 
215.000 
225.000 
210.000 
215.000 
218.000 
232.000 
689.000 
228.000 
299.500 
169.000 
135.000 
549.000 
125.000 
160.000 
395.000 
430.000 
210.000 



回答2:


Here is the sample code for your question.

import re
import requests
page = requests.get("https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000")
print(page.content)

from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

prices = []
for items in soup.find_all("div", {"class": "xl-price rangePrice"}):
if items.string:
    result = re.findall(r'\d+.\d+', items.string)
    prices.append(result[0])
else:
    soup1 = BeautifulSoup(str(items), 'html.parser')
    for item in soup1.find("div", {"class": "xl-price rangePrice"}):
        if item.string:
            result = re.findall(r'\d+.\d+', item.string)
            if len(result)>0:
                prices.append(result[0])

print(prices)



回答3:


I don’t have access to a computer right now, so consider this quasi-pseudocode:

new_price = div_elem.find(text=True, recursive=False)

find_res = div_elem.find('span', attrs={'class': 'old-price'})

if find_res:
    old_price = find_res.get_text(strip=True)

I tried to keep things as easy to understand as possible.

Let me know if you have any questions :)



来源:https://stackoverflow.com/questions/59123337/how-can-i-get-the-first-string-from-a-div-that-has-a-div-embedded-beautifulsoup4

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!