python: get data from changing span class using lxml xpath

问题

I want to extract 'Return On Assets' from wsj websites. However, my code is not robust enough to work in different conditions. I able to extract data for ticker 'SCGM' using the code below but fail for'AASIA' as <span class="marketDelta deltaType-negative">

from lxml import html
import requests

StockData =['SCGM','AASIA']
page_wsj1 = requests.get('http://quotes.wsj.com/MY/'+StockData[x]+'/financials')
wsj1 = html.fromstring(page_wsj1.content)
wsj_fig = wsj1.xpath('//span[@class="marketDelta noChange"]/text()')
ROA = wsj_fig[25]

No issue for SCGM but for AASIA, it did not work as the span class is changed. For SCGM, the html tags as below. Full link here

<tr> <td> <span class="data_lbl">Return on Assets</span> <span class="data_data"> <span class="marketDelta noChange">18.26</span> </span> </td> </tr>

For AASIA, the html tags as below . Full link here

<tr> <td> <span class="data_lbl">Return on Assets</span> <span class="data_data"> <span class="marketDelta deltaType-negative">-1.36</span> </span> </td> </tr>

How to have a code that work for both conditions or point straight to 'Return on Assets'?

回答1:

//td[normalize-space(span) = "Return on Assets"]/span[@class = "data_data"]/span

来源：https://stackoverflow.com/questions/40488422/python-get-data-from-changing-span-class-using-lxml-xpath

标签

python

html

xpath

lxml

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!