How to retrieve the list of values from a drop down list

拜拜、爱过 提交于 2021-02-10 19:54:41

问题


I am trying to retrieve the list of available option expiries for a given ticker on yahoo finance. For instance using SPY as ticker on https://finance.yahoo.com/quote/SPY/options

The list of expiries are in the drop down list:

<div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4"> 
    <select class="Fz(s)" data-reactid="5"> 
        <option selected="" value="1576627200" data-reactid="6">December 18, 2019</option> 
        <option value="1576800000" data-reactid="7">December 20, 2019</option> 
        <option value="1577059200" data-reactid="8">December 23, 2019</option> 
        ...
    < / select > 
< / div >

Using the div class name (or the select class name, but there seems to be several of these on the page), I get the list of values as a single string of concatenated expiries.

My function (I pass on ticker='SPY' from the main function):

def get_list_expiries(ticker):
    browser = webdriver.Chrome()
    options_url = "https://finance.yahoo.com/quote/" + str(ticker) + "/options"
    browser.get(options_url)
    html_source = browser.page_source
    soup = BeautifulSoup(html_source, 'html.parser')
    expiries_dt = []


    for exp in soup.find_all(class_="Fl(start) Pend(18px) option-contract-control drop-down-selector"):
        expiries_dt.append(exp.text)

    browser.quit()
    return expiries_dt

This produces:

['December 18, 2019December 20, 2019December 23, 2019December 24, 2019December 27, 2019December 30, 2019...']

I understand I need to use selenium for this but I can't figure out how. The result is always a list of a single string. Ideally I would like to return two lists: one with the unix datestamp (option value="1576627200") and another list with the 'normal' dates (ie 18/12/2019).

Any help will be greatly appreciated.


回答1:


To extract the unix datestamp and Expiry Dates you have to induce WebDriverWait and you can use the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import Select
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    
    driver.get('https://finance.yahoo.com/quote/SPY/options')
    select = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.option-contract-control.drop-down-selector>select"))))
    print("Unix datestamp: ")
    print([option.get_attribute("value") for option in select.options])
    print("Dates: ")    
    print([option.get_attribute("innerHTML") for option in select.options])
    
  • Console Output:

    Unix datestamp:
    ['1576627200', '1576800000', '1577059200', '1577145600', '1577404800', '1577664000', '1577750400', '1578009600', '1578268800', '1578441600', '1578614400', '1578873600', '1579046400', '1579219200', '1579564800', '1579824000', '1580428800', '1582243200', '1584662400', '1585612800', '1587081600', '1589500800', '1592524800', '1593475200', '1594944000', '1600387200', '1601424000', '1602806400', '1605830400', '1606780800', '1608249600', '1610668800', '1616112000', '1623974400', '1631836800', '1639699200', '1642723200']
    Dates:
    ['December 18, 2019', 'December 20, 2019', 'December 23, 2019', 'December 24, 2019', 'December 27, 2019', 'December 30, 2019', 'December 31, 2019', 'January 3, 2020', 'January 6, 2020', 'January 8, 2020', 'January 10, 2020', 'January 13, 2020', 'January 15, 2020', 'January 17, 2020', 'January 21, 2020', 'January 24, 2020', 'January 31, 2020', 'February 21, 2020', 'March 20, 2020', 'March 31, 2020', 'April 17, 2020', 'May 15, 2020', 'June 19, 2020', 'June 30, 2020', 'July 17, 2020', 'September 18, 2020', 'September 30, 2020', 'October 16, 2020', 'November 20, 2020', 'December 1, 2020', 'December 18, 2020', 'January 15, 2021', 'March 19, 2021', 'June 18, 2021', 'September 17, 2021', 'December 17, 2021', 'January 21, 2022']
    



回答2:


try use SimplifiedDoc, It's a library for extraction

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html='''<div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4"> 
    <select class="Fz(s)" data-reactid="5"> 
        <option selected="" value="1576627200" data-reactid="6">December 18, 2019</option> 
        <option value="1576800000" data-reactid="7">December 20, 2019</option> 
        <option value="1577059200" data-reactid="8">December 23, 2019</option> 
        ...
    </select> 
</div>
'''
doc = SimplifiedDoc(html)
div = doc.getElementByClass('Fl(start) Pend(18px) option-contract-control drop-down-selector')
options = div.options # get all options
expiries_dt = [option.html for option in options]
print (expiries_dt) # ['December 18, 2019', 'December 20, 2019', 'December 23, 2019']



回答3:


You don't need selenium for this bit at least (and to be honest for most Yahoo finance info it is overkill). You can regex out timestamps from response text (converting string representation of list returned to actual list with ast) and use datetime module to convert to required date format.

import requests, re, ast
from datetime import datetime

r = requests.get('https://finance.yahoo.com/quote/SPY/options?guccounter=1')
p = re.compile(r'"expirationDates":(\[.*?\])')
timestamps = ast.literal_eval(p.findall(r.text)[0])
dates = [datetime.utcfromtimestamp(ts).strftime("%B %d, %Y") for ts in timestamps]

Regex explanation:


Datetime conversions:

  1. See discussion by @jfs which is where I saw utcfromtimestamp originally
  2. strftime


来源:https://stackoverflow.com/questions/59401010/how-to-retrieve-the-list-of-values-from-a-drop-down-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!