What would be the best way to extract square meters from a string that also mentions the amount of bedrooms?

跟風遠走 提交于 2020-01-24 00:23:10

问题


I'm trying to extract:

<div class="xl-surface-ch"> 
                            &nbsp;84 m²  &nbsp;&nbsp;&nbsp;2 bed.  
                        </div>

from link the problem is, I only need the "84" in this string (they sometimes go over 2 or 3 digits as well).

Added difficulty is that sometimes the square meters are not mentioned, which looks like this:

<div class="xl-surface-ch"> 
                             &nbsp;&nbsp;&nbsp;2 bed.  
                        </div>

and in that case I'd need to return a 0

My best attempt is:

    sqm = []
for item in soup.findAll('div', attrs={'class': 'xl-surface-ch'}):
    item = item.contents[0].strip()[0:4]
    item_clean = re.findall("[0-9]{2,4}", item)
    sqm.append(item_clean)

print(sqm)

But this doesn't seem to work and won't be at all what I need for the end result as stated above. Here's the result I'm getting with my code:

[['84'], ['70'], ['80'], ['32'], ['149'], ['22'], ['75'], ['30'], ['23'], ['104'], [], ['95'], ['129'], ['26'], ['55'], ['26'], ['25'], ['28'], ['33'], ['210'], ['37'], ['69'], ['36'], ['19'], ['119'], ['20'], ['20'], ['129'], ['154'], ['25']]

Would be really interested in what kinds of solution you guys cook up because I honestly think there isn't really a solution, especially since you sometimes have buildings without the sqm... maybe with an if statement? I'm going to try that right now anyhow.

Thank you in advance!


回答1:


import requests
from bs4 import BeautifulSoup

r = requests.get(
    'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000')
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll('div', attrs={'class': 'xl-surface-ch'}):
    item = item.text.strip()
    if 'm²' in item:
        print(item[0:item.find('m')])
    else:
        item = 0
        print(item)

Output:

84 
70 
80 
32 
149 
22 
75 
30 
23 
104 
0
95 
129 
26 
55 
26 
25 
28 
33 
210 
37 
69 
36 
19 
119 
20 
20 
129 
154 
25 


来源:https://stackoverflow.com/questions/59133349/what-would-be-the-best-way-to-extract-square-meters-from-a-string-that-also-ment

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!