问题
Ok so I am just learning python and I want to use web scraping. I was watching this tutorial and there the tutor has a totally different "inspect" page(or whatever it is called) than mine. So what he sees is class = "ProfileHeaderCard", and what I see is class = "css-1dbjc4n r-1iusvr4 r-16y2uox r-5f2r5o r-m611by". THE IMPORTANT PART is that BeautifulSoup library does not work when I use my version of the class name but it works when I use his version. When I say print(soup.find('div', {"class":"css-1dbjc4n r-1iusvr4 r-16y2uox r-5f2r5o r-m611by"}))
it returns None
What is going on lol please help.
from bs4 import BeautifulSoup
import urllib.request
theurl = 'https://twitter.com/1kasecorba'
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, 'html.parser')
print(soup.find('div', {"class":"css-1dbjc4n r-1iusvr4 r-16y2uox r-5f2r5o r-m611by"}))
回答1:
It does not find it because it is not there. Note that when you perform GET request on a page, you often don't get the same source you see when you open a page in a browser and see source there (Control + U).
I wrote a script that writes the content of source got by urllib to a text file, and no such class you are looking for is there. There's nothing wrong with the soup.find function, as you will see on the example at the last line.
from bs4 import BeautifulSoup
import urllib.request
theurl = 'https://twitter.com/1kasecorba'
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, 'html.parser')
file = open("page_source.txt", "w+", encoding="utf-8")
file.write(str(soup))
file.close()
# works as charm
print(soup.find('button', {"class":"modal-btn modal-close modal-close-fixed js-close"}))
If you want to see the real source, you will need a tool like Selenium (there are probably better options, I can't give much advice on this topic).
来源:https://stackoverflow.com/questions/58885176/webscraping-with-python-i-cant-see-the-actual-names-of-classes-when-i-say-insp