Can't extract the text and find all by BeautifulSoup

末鹿安然 提交于 2020-01-14 04:37:18

问题


I want to extract the all the available items in the équipements, but I can only get the first four items, and then I got '+ plus'.

import urllib2
from bs4 import BeautifulSoup
import re
import requests
headers = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
url = 'https://www.airbnb.fr/rooms/8261637?s=bAMrFL5A'
req = urllib2.Request(url = url, headers = headers)
html = urllib2.urlopen(req)
bsobj = BeautifulSoup(html.read(),'lxml')
b = bsobj.findAll("div",{"class": "row amenities"})

for the result of b, it does not return all the list inside the tag. And for the last one of it is '+ plus', looks like as following.

<span data-reactid=".mjeft4n4sg.0.0.0.0.1.8.1.0.0.$1.1.0.0">+ Plus</span></strong></a></div></div></div></div></div>]

回答1:


This is because data filled up using reactjs after page load. So if you download it via requests you can't see the data.

Instead you have to use selenium web driver, open page and process all the javascripts. Then you can get ccess to all data you expect



来源:https://stackoverflow.com/questions/34363121/cant-extract-the-text-and-find-all-by-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!