Python web scrapping HTML with same class

假装没事ソ 提交于 2021-01-29 22:33:13

问题


I would like to ask how can i extract the event's fees from this website using python libraries (beautifulSoup) for web scrapping.

However, the event's fee share the same class with other properties. I would like to ask is there any suggestions to extract only the fees. I have try find_next, find_next_sibling and find next_parent but still no use. Below is the raw html code where the price's class located:

<div class="eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1 eds-event-card-content__sub--cropped">Free</div>

I would appreciate if any help provided.

Below is the code that i have try. I only get a list of tag in my array.

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=1'

response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

#Finding common container for each event
containers = soup.find_all('article', class_ = 'eds-l-pad-all-4 eds-event-card-content eds-event-card-content--list eds-event-card-content--standard eds-event-card-content--fixed eds-l-pad-vert-3')

event_fees = []

for container in containers:
        fees = soup.select('div', class_ ='eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1 eds-event-card-content__sub--cropped')
        event_fees.append(fees.txt)


回答1:


The data about prices is loaded from external URL. You can use requests/json modules to get it:

import re
import json
import requests


url = "https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=1"
events_url = 'https://www.eventbrite.com/api/v3/destination/events/?event_ids={event_ids}&expand=event_sales_status,primary_venue,image,saves,my_collections,ticket_availability&page_size=99999'
html_text = requests.get(url).text

data1 = json.loads( re.search(r'window\.__SERVER_DATA__ = ({.*});', html_text).group(1) )

# uncomment this to print all data:
# print(json.dumps(data1, indent=4))

event_ids = ','.join(r['id'] for r in data1['search_data']['events']['results'])
data2 = requests.get(events_url.format(event_ids=event_ids)).json()

# uncomment this to print all data:
# print(json.dumps(data2, indent=4))

for e in data2['events']:
    print(e['name'])
    print(e['ticket_availability']['minimum_ticket_price']['display'],'-',e['ticket_availability']['maximum_ticket_price']['display'])
    print('-' * 80)

Prints:

Mega Career Fair & Post Graduate Education Fair 2020 - Mid Valley KL
0.00 MYR - 0.00 MYR
--------------------------------------------------------------------------------
Post Graduate Education Fair 2020 - Mid Valley KL
0.00 MYR - 0.00 MYR
--------------------------------------------------------------------------------
Traders Fair 2021 - Malaysia (Financial Education Event)
0.00 USD - 199.00 USD
--------------------------------------------------------------------------------
THE FIT Malaysia
0.00 MYR - 0.00 MYR
--------------------------------------------------------------------------------
Walk-In Interview with Career Partners of HRDF
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Entrepreneurship for Beginners - Startup | Entrepreneur Hackathon Webinar
0.00 EUR - 0.00 EUR
--------------------------------------------------------------------------------
Good Shepherd Catholic Church  English Mass Registration- Scroll Down  pls
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
CGH 10:00am Assumption Mass Registration
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Kuala Lumpu Video Speed Dating - Filter Off
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Wiki Finance EXPO Kuala Lumpur 2021
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
English Sunday Service - 16 AUGUST
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Good Shepherd Catholic  Bahasa Malaysia Mass Registration. Pls scroll down
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
How To Improve Your Focus and Limit Distractions - Kuala Lumpur
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
ANNUAL GENERAL MEETING
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
ITS ALL ABOUT PORTRAIT
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
First service (English)
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
KL International Flea Market 2020 / Bazaar Antarabangsa Kuala Lumpur
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Branding Strategies For Startups
10.50 MYR - 31.50 MYR
--------------------------------------------------------------------------------
SHC 9.15am Sunday Mass Registration
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
SHC 9.15am Sunday Mass (Tamil) திருஇருதய ஆண்டவர் ஆலயத்தில்  காலை  9.15க்கு
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------


来源:https://stackoverflow.com/questions/63390671/python-web-scrapping-html-with-same-class

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!