How to extract <div data-v-xxxxxxxx> </div> from HTML using BeautifulSoup?

随声附和 提交于 2021-01-29 09:42:36

问题


This website that I'm webscraping has this HTML code:

<div data-v-38788375 data-v-07b96579 class="rating score orange">9.3</div>

How could I extract the 9.3 value using BeautifulSoup?

Here is my code:

from bs4 import BeautifulSoup
import requests

page = requests.get('https://www.hostelworld.com/search?search_keywords=Phuket,%20Thailand&country=Thailand&city=Phuket&date_from=2019-10-14&date_to=2019-10-17&number_of_guests=2')

soup = BeautifulSoup(page.text,'lxml')
rating = soup.find('div', attrs={'class':'rating score orange'})
print(rating)

Which returns None and I don't know why.


回答1:


No need to use bs4 as the data gets handled by an xmlhttprequest and you will not catch it using the bs4 parser

this will work for you ( but keep in mind you will have to manage the params in the request)

Python Code :

import requests,json

page = requests.get('https://www.hostelworld.com/city/list-properties? cityId=150&language=en&arrival=2019-10-14&departure=2019-10-17&numberOfGuests=2&groupType=&ageRanges=')
data = json.loads(page.text)
for index,prop in enumerate(data['properties']):
    if index == 15:
       break
    print(prop['name']+'--->',+ prop['averageRating']/10)

Output:

Lub d Phuket Patong---> 9.3
Slumber Party Hostel Phuket---> 9.2
The Neighbors Hostel---> 9.4
Bodega Phuket Party Hostel---> 9.0
Sleepy Station Phuket---> 9.2
Baan Baan Hostel---> 9.5
Kata Villa---> 8.8
White Wall Poshtel---> 9.3
BGW Phuket---> 8.8
Hip Hostel---> 9.1
Roost Glamping---> 9.7
Happy Fish Guesthouse---> 7.8
FIN Hostel Phuket Kata Beach---> 8.2
Vitamin Sea Hostel Phuket---> 9.2
Beehive Phuket Oldtown Hostel---> 8.7
The Camp Hostel Kata Beach Phuket---> 8.6
FIN Hostel Co: Working---> 8.0
Borbaboom Poshtel---> 9.5
Doolay Beachfront Hostel---> 9.0
Kata Station---> 8.3
Bearpacker Hostel---> 9.2
Stay at Kata Poshtel---> 10.0
L'atelier Poshtel Phuket---> 8.6
Ai Phuket Hostel---> 8.6
Glur Phuket Patong Beach---> 8.5
Aekkeko Hostel---> 9.0
Must Sea Hotel---> 9.7
Karon Living Room Hotel---> 9.3
Hugger Hostel---> 7.6
Enrico Hostel Patong---> 6.6
Baan Kamala Fantasea Hotel---> 8.4
Box Poshtel---> 9.4
Hubb Hostel Phuket Airport---> 8.7
Patong Terrace Boutique Hotel---> 9.7
The Luna Hostel Phuket Airport---> 9.4
Book a Bed Poshtel---> 9.2
Panda Hostel Phuket---> 7.4
Bloo Hostel Phuket---> 8.1
Bedbox Guesthouse & Hostel---> 9.2
The Snug Airportel---> 9.0
Chillhub Hostel---> 8.9
Nonnee---> 8.4
Airport Hostel Phuket---> 8.5
Sleep Box Patong Hostel---> 8.9
Take A Break @ Naiyang Beach - Phuket Airport---> 8.8
12 Month Hostel---> 8.2
Sleep Sheep Phuket Hostel & Cafe'---> 8.0
Breezotel---> 8.6
Chino Town Gallery Hostel---> 9.7
Villa Oxavia---> 0.0
Goodnight Hostel---> 9.4
Dfeel Hostel---> 8.3
Patong Backpacker Hostel---> 5.9
7Q Bangla Hotel---> 10.0
Feel Good Hostel---> 8.4
7Q Patong Beach Hotel---> 10.0
Sino Hostel @ Kata---> 8.7
iNest Poshtel---> 9.4
MEMO Residence---> 6.6
#Me Hostel---> 9.1
Paradise Beach Backpackers---> 7.4
Coconut Wells Phuket---> 10.0
99 Voyage Patong---> 8.6
Eco Hostel Phuket---> 8.9
Kokotel Phuket Patong---> 10.0
Ananas Phuket Central Hostel---> 9.4
TP Hostel Kata Beach Phuket---> 9.4
Lupta Hostel---> 8.4
The Room Patong Hotel---> 0.0
Pineapple Guesthouse---> 10.0
Southern Fried Rice---> 8.1
WIRE Hostel Patong---> 3.4
Fulfill Phuket Hostel---> 7.0
La Pianta Hostel---> 0.0
Pakta Phuket---> 0.0
Bella Guesthouse Patong---> 9.4
Feelgood@Journey Hostel---> 7.3
Jinta Andaman Kata Beach Phuket---> 7.4
Forty Winks Phuket Hotel---> 8.0
Phuket Backpacker---> 2.0
Phunara Residence---> 10.0
The Arbern Hostel x Bistro---> 9.3
The Artist House Patong---> 8.0
Pensiri House---> 9.6
Aquamarine Resort---> 7.7
Phuketnumnoi---> 0.0
H2b Hostel---> 0.0
MVC Patong House---> 7.7
Rattana Mansion---> 0.0
Phuket Capsule and Hidden Pool Bar---> 8.6
Jiraporn Hill Resort Patong Phuket---> 0.0
Squareone---> 10.0
Silver Resortel---> 8.3
99 Residence Patong---> 9.4
Simplitel Phuket---> 0.0
myPatong---> 0.0
Baan Suay Backpackers---> 7.4
Stanley's Guesthouse---> 0.0
Naiyang Park Resort---> 0.0
CC's Hideaway Hotel---> 0.0
Phuket Airport Hotel---> 10.0
JR Siam Kata Resort---> 8.3
Ananas Phuket Hostels---> 8.9
Airport Mansion & Restaurant---> 0.0
Loma Hostel @ Phuket Town---> 9.1
Sino Imperial Phuket Hotel---> 9.4
Lub Sbuy Hostel---> 7.4
Sound Gallery House---> 0.0
Deevana Plaza Phuket Patong---> 0.0
The Blue Pearl Kata Hotel---> 0.0
Phuket Marine Poshtel---> 10.0
365 Panwa Villas Resort---> 0.0
Hao Hostel---> 7.6
The Lantern Hostel---> 0.0
The Orchid House---> 0.0
Phuket Center Hotel---> 0.0
Phuket Blue Hostel---> 0.0
Recenta Express Phuket Town---> 7.1
Blue Sky Residence---> 0.0
Ekkamon Mansion---> 5.4
The Tint At Phuket Town---> 0.0
Beds Patong---> 0.0
Diamond Resort Phuket---> 0.0
Oasis Apartments, Guesthouse, Hostel & Bar---> 0.0
Princess Seaview Resort & Spa---> 0.0
Buasai Residence---> 0.0
Ozone Condotel---> 0.0
Tall Tree Poshtel Phuket---> 0.0
Oceanstone---> 0.0
Smile House and Pool---> 0.0
Red Planet Phuket Patong---> 0.0
Ban Patong Residence---> 0.0
Chilli Salza Patong---> 0.0
Phuket Oldtown hostel---> 0.0
Aloha Residens---> 0.0
Hostel Our Nomad---> 0.0
Bed Hostel---> 0.0
Chino Town Gallery Guesthouse---> 0.0
Andaman Seaside Resort---> 0.0
Nice Bangtao Beach---> 0.0
Lub Sbuy House Hotel---> 0.0
Garden Home Kata---> 7.7
The Little Moon Residence---> 0.0
Gotum Hostel & Restaurant 2---> 0.0
Eden Hostel---> 0.0
Baan SS Karon---> 0.0
Hangover Inn---> 0.0
The Lucky Kata Hostel---> 3.4
Deevana Patong Resort & Spa---> 0.0
Chanalai Hillside Resort---> 0.0
Na Siam Guest House & Cafe---> 0.0
Siam House---> 8.4
Som Guesthouse---> 0.0
The Memory at On On Hotel---> 0.0
Andaman Place Guesthouse---> 0.0
Lamai Guesthouse---> 0.0
Bangtao Kanita House---> 0.0
Sino House Hotel Apartment---> 0.0
Lamai Hotel---> 2.0
Best Western Premier Bangtao Beach Resort & Spa---> 0.0
Presley Guesthouse---> 0.0
Blu Monkey Bed n Breakfast Phuket---> 0.0
The Z Nite Hostel---> 0.0
Sino Inn---> 0.0
Samkong Place---> 0.0
Karon Clinic---> 8.7
SM Resort---> 0.0
Vipa House Phuket---> 0.0
The Blanket Hotel Phuket Town---> 0.0
Break Point Hotel---> 6.2
Kamala Beach Resort---> 0.0
C&N Hotel---> 0.0

in this way you will get all the data you need (images, prices,name... etc) and all the pages without the need to use pagination

Demo : Here




回答2:


I just run this:

BeautifulSoup('<div data-v-38788375 data-v-07b96579 class="rating score orange">7.5</div>').find('div', attrs={'class': 'rating score orange'}).text

and got the output of 7.5




回答3:


I'm going to add to something useful here. You've already realized that the url used wasn't your original url, but a different url that leads to a json object.

How to get the json url? Follow these steps:

  1. Open the original url in a browser (mine is Firefox)
  2. Right click>Inspect Element
  3. Navigate to the Network tab (refresh the webpage if it is empty) and you will see some objects with json type under XHR
  4. Find the json object you want (usually it is the one with the largest size)
  5. Click on header and copy the request url (this will be your json url)

Here is a picture to guide you,

Repeat the steps above for a different city and you will get a different cityID. I hope it helps.



来源:https://stackoverflow.com/questions/58362598/how-to-extract-div-data-v-xxxxxxxx-div-from-html-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!