问题
I am trying to get data from this web page https://playruneterra.com/es-es/news and the part I am trying to get is this:
I am using BeatufulSoup to get the html and search in it but when I used the findAll method to get that line, it returns me an empty array. I tried the same in other pages and it works fine. What is happening?
This is my code:
This is an example that is working:
Thanks all.
回答1:
You can use the PyQt to build a headless browser and then scrapp the data from the website. Here's the demo code for you:
import bs4 as bs
import sys
import urllib.request
from PyQt5.QtWebEngineWidgets import QWebEnginePage
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
class Page(QWebEnginePage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebEnginePage.__init__(self)
self.html = ''
self.loadFinished.connect(self._on_load_finished)
self.load(QUrl(url))
self.app.exec_()
def _on_load_finished(self):
self.html = self.toHtml(self.Callable)
print('Load finished')
def Callable(self, html_str):
self.html = html_str
self.app.quit()
def main():
page = Page('https://playruneterra.com/es-es/news')
soup = bs.BeautifulSoup(page.html, 'html.parser')
js_test = soup.find('h2', class_='heading-03 src-component-content-NewsItem-___NewsItem-module__title___3OcDj')
print(js_test.text)
if __name__ == '__main__': main()
回答2:
The second parameter to findAll should be a dict, not a string.
回答3:
first right click and go into view page source and search for the keyword you're looking for , if you can find your content there then you can use soup over it or else you can make use of selenium
And in the case of soup just Wrap classnames in dict format
title = soup.findAll('h2',{'class':'add your full classes here'})
来源:https://stackoverflow.com/questions/61558995/beatifulsoup-findall-is-returning-an-empty-array-python