问题
I am trying to pull data from a local government's website using BeautifulSoup with Python, but the source code that it pulls down lacks the info I want. I know how to use BeautifulSoup and I can pull any part of the source code I want down and use it in python, but the data I want is not there. What happens is the page has all of the tags laid out with their appropriate id, yet there is no value. I see this every time I go to the page source on Chrome. Every time I go to the inspected page, the data is put in where you would think it would be to render the page. Some of the data that is blank in the source but there in the inspect page does not have an id on the <td> tag. It has a plain, untouched <td>.
I know the website pulls the data from a database because I someone who helped created the database that it pulls the data from. I have talked to them, and they do not know how to get it. As the title says, how is the data being entered, and how to I access it?
回答1:
Like the others have stated, you cannot see the data because it is being generated by JavaScript. To work around this, you will need to use something like Selenium or Splash to render the JavaScript first.
I will provide an example using selenium as selenium is a bit more user friendly to use. Here are some great resources to get started.
https://pythonspot.com/selenium-get-source/
https://selenium-python.readthedocs.io/installation.html
from selenium import webdriver
from bs4 import BeautifulSoup
URL = "your url here"
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument("--test-type")
options.binary_location = "/usr/bin/chromium"
driver = webdriver.Chrome(chrome_options=options)
driver.get(URL)
html = driver.page_source
soup = BeautifulSoup('html.parser', html)
"""
Do your desired parsing
"""
来源:https://stackoverflow.com/questions/59205843/where-does-data-not-in-a-websites-source-code-come-from-and-how-do-i-get-it-usi