parse html content by passing custom date input

▼魔方 西西 提交于 2019-12-11 15:30:52

问题


I am parsing data from here. On the webpage I can get data for example yesterday by selecting the desired date. How can I parse to get the same data (ie. yesterday)? Like, pass custom dates to get data for that date.


回答1:


You can either use Selenium or use the site's ajax api.
Here is an example of the latter:

def get_by_date(date):
    url = 'https://markets.ft.com/data/world/ajax/getnextecoevents?startDate=' + date
    r = requests.get(url)
    return r.json()['html']

date should be formatted as yyyy-mm-dd, eg: "2017-07-20"

Using the above function and bs4 to scrape the table contents:

html = get_by_date('2017-06-20')
soup = BeautifulSoup(html, 'html.parser')
data = [[td.text for td in tr.find_all('td')] for tr in soup.find('table').find_all('tr')]


来源:https://stackoverflow.com/questions/45226861/parse-html-content-by-passing-custom-date-input

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!