Using Python Requests Module with Dropdown Options

假装没事ソ 提交于 2019-12-22 16:42:14

问题


I am trying to scrape information from this webpage: https://www.tmea.org/programs/all-state/history

I want to select several options from the first dropdown menu and use Beautiful Soup to pull the information I need. First I tried using beautiful soup to extract the different options:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://www.tmea.org/programs/all-state/history')

soup = BeautifulSoup(page.text, 'html.parser')

body = soup.find(id = 'organization')
options = body.find_all('option')

for name in options:
    child = name.contents[0]
    print(child)

That worked for pulling the different options, but I want to be able to submit a particular option and pull that information. I tried adding:

payload = {'organization': '2018 Treble Choir'}
r = requests.post('https://www.tmea.org/programs/all-state/history', data = payload)
print(r.text)

I had used this before with other pages that use POST, and don't quite understand why this case is different. Does the use of dropdown options mean that I have to something such as Selenium? I've used it before, but I'm not sure how to use it in conjunction with Beautiful Soup.


回答1:


1) I'm not seeing POST used in the XHR and Fetch (SEE EDIT BELOW)

2) Yes you can use Selenium then to do this. Just use Selenium like you normally would to get the table. Once the table is rendered you can feed that into BeautifulSoup. So for example:

url = 'https://www.tmea.org/programs/all-state/history'

driver = webdriver.Chrome()
driver.get(url)

# Your code to find/select the drop down menu and select 2018 Treble Choir
...
...

#Once that page is rendered...
soup = BeautifulSoup(driver.page_source, 'html.parser')

And honestly, I wouldn't bother with BeautifulSoup for this as it looks like it's a <table> tag. Let Pandas do that work:

url = 'https://www.tmea.org/programs/all-state/history'

driver = webdriver.Chrome()
driver.get(url)

# Your code to find/select the drop down menu and select 2018 Treble Choir
...
...

#Once that page is rendered...
tables = pd.read_html(driver.page_source)

EDIT

I found the POST Request method under Doc. You need to include a few more parameters in your payload:

import pandas as pd
import requests

payload = {
'organization': '2018 Treble Choir',
'instrument': 'All',
'school_op': 'eq',
'school': '',
'city_op': 'eq',
'city': '',
's': '',
'submit': 'Search'}


r = requests.post('https://www.tmea.org/programs/all-state/history', data = payload)
print(r.text)

tables = pd.read_html(r.text)
table = tables[0]

Output:

print (table)
                       0       ...                     4
0    Year - Organization       ...                  City
1                    NaN       ...                   NaN
2      2018 Treble Choir       ...               El Paso
3      2018 Treble Choir       ...          Flower Mound
4      2018 Treble Choir       ...               Helotes
5      2018 Treble Choir       ...                Canyon
6      2018 Treble Choir       ...               Mission
7      2018 Treble Choir       ...                Belton
8      2018 Treble Choir       ...             Mansfield
9      2018 Treble Choir       ...                 Wylie
10     2018 Treble Choir       ...               El Paso
11     2018 Treble Choir       ...           San Antonio
12     2018 Treble Choir       ...              Beeville
13     2018 Treble Choir       ...         Grand Prairie
14     2018 Treble Choir       ...           San Antonio
15     2018 Treble Choir       ...           Brownsville
16     2018 Treble Choir       ...               Houston
17     2018 Treble Choir       ...               Woodway
18     2018 Treble Choir       ...                  Katy
19     2018 Treble Choir       ...                Canyon
20     2018 Treble Choir       ...               Crowley
21     2018 Treble Choir       ...           Trophy Club
22     2018 Treble Choir       ...              Amarillo
23     2018 Treble Choir       ...             Deer Park
24     2018 Treble Choir       ...                Dallas
25     2018 Treble Choir       ...           Brownsville
26     2018 Treble Choir       ...               Houston
27     2018 Treble Choir       ...            Carrollton
28     2018 Treble Choir       ...                 Plano
29     2018 Treble Choir       ...               Helotes
..                   ...       ...                   ...
140    2018 Treble Choir       ...                Austin
141    2018 Treble Choir       ...                 Hurst
142    2018 Treble Choir       ...           League City
143    2018 Treble Choir       ...                Odessa
144    2018 Treble Choir       ...                 Heath
145    2018 Treble Choir       ...            Cedar Park
146    2018 Treble Choir       ...        Jersey Village
147    2018 Treble Choir       ...             Harlingen
148    2018 Treble Choir       ...         Grand Prairie
149    2018 Treble Choir       ...               Coppell
150    2018 Treble Choir       ...               Lubbock
151    2018 Treble Choir       ...         The Woodlands
152    2018 Treble Choir       ...                Laredo
153    2018 Treble Choir       ...                Sachse
154    2018 Treble Choir       ...              Pearland
155    2018 Treble Choir       ...           San Antonio
156    2018 Treble Choir       ...                Conroe
157    2018 Treble Choir       ...                Dallas
158    2018 Treble Choir       ...             Arlington
159    2018 Treble Choir       ...              Pearland
160    2018 Treble Choir       ...                 Klein
161    2018 Treble Choir       ...               Houston
162    2018 Treble Choir       ...                Keller
163    2018 Treble Choir       ...               Houston
164    2018 Treble Choir       ...            Fort Worth
165    2018 Treble Choir       ...                Humble
166    2018 Treble Choir       ...             Deer Park
167    2018 Treble Choir       ...               Houston
168    2018 Treble Choir       ...              Magnolia
169    2018 Treble Choir       ...                  Katy

[170 rows x 5 columns]


来源:https://stackoverflow.com/questions/54789508/using-python-requests-module-with-dropdown-options

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!