parsing site with beautifulsoup

后端未结

关注

 2  1007

野的像风 2021-01-14 14:57

i\'m trying to learn how to parse html with python and i`m currently stuck with soup.findAll return me an empty array,therefore there are elements which could be found Here

2条回答

南方客 (楼主)

2021-01-14 15:47
i'm trying to learn how to parse html with python

You happened to pick a webpage which isn't very beginner-friendly when it comes to webscraping. Broadly speaking, most webpages use one or both of these two common methods for loading / displaying data:
- The user makes a request to a server (visits a page, for example). The server gets the necessary data from a database. The server generates an HTML response using a templating engine, and returns the response for the user's browser to render.
- The user makes a request to a server. The server returns an HTML-skeleton response which gets populated with data dynamically by making other requests / using APIs etc.
The webpage you picked is of the second type. Just because you can see the elements in the "Elements" tab of Chrome's Dev Tools doesn't mean that that's what the server sent you. By looking at the network tab of Chrome's Dev Tools you can see that a request is made to these two resources: https://fb.oddsportal.com/ajax-next-games/2/0/1/20191114/yje3d.dat?=1574007087150 https://fb.oddsportal.com/ajax-next-games-odds/2/0/X0/20191114/1/yje3d.dat?=1574007087151

(The Query String parameters will not be the same for you. Visiting those urls also won't be very interesting unless you provide the right payload.)

The first resource seems to be a jQuery script which makes a request, the response of which contains HTML (this is your table). It looks something like this:

You can see that they seem to have assigned unique IDs to each of the matches. Giron Marcos vs. Holt Brandon in this case has an ID of ATM9GmXG.

The second resource is similar. It's also a jQuery script which seems to be making a request to their main API. The response this time is JSON, which is always desirable for webscraping. Here's what part of that looks like (notice the same ID):
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...