Grabbing non-HTML data from a website using python

烂漫一生 提交于 2019-12-06 10:53:36

问题


I'm trying to get the current contract prices on this page to a string: http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500.html

I would really like a python 2.6 solution.

It was easy to get the page html using urllib, but it seems like this number is live and not in the html. I inspected the element in Chrome and it's some td class thing.

But I don't know how to get at this with python. I tried beautifulsoup (but after several attempts gave up getting a tar.gz to work on my windows x64 system), and then elementtree, but really my programming interest is data analysis. I'm not a website designer and don't really want to become one, so it's all kind of a foreign language. Is this live price XML?

Any assistance gratefully received. Ideally a simple to install module and some actual code, but all hints and tips very welcome.


回答1:


It looks like the numbers in the table are filled in by Javascript, so just fetching the HTML with urllib or another library won't be enough since they don't run the javascript. You'll need to use a library like PyQt to simulate the browser rendering the page/executing the JS to fill in the numbers, then scrape the output HTML of that.

See this blog post on working with PyQt: http://blog.motane.lu/2009/07/07/downloading-a-pages-content-with-python-and-webkit/link text




回答2:


If you look at that website with something like firebug, you can see the AJAX calls it's making. For instance the initial values are being filled in with a AJAX call (at least for me) to:

http://www.cmegroup.com/CmeWS/md/MDServer/V1/Venue/G/Exchange/XCME/FOI/FUT/Product/ES?currentTime=1292780678142&contractCDs=,ESH1,ESM1,ESU1,ESZ1,ESH2,ESH1,ESM1,ESU1,ESZ1,ESH2

This is returning a JSON response, which is then parsed by javascript to fill in the tabel. It would be pretty simple to do that yourself with urllib and then use simplejson to parse the response.

Also, you should read this disclaimer very carefully. What you are trying to do is probably not cool with the owners of the web-site.




回答3:


Its hard to know what to tell you wothout knowing where the number is coming from. It could be php or asp also, so you are going to have to figure out which language the number is in.



来源:https://stackoverflow.com/questions/4481505/grabbing-non-html-data-from-a-website-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!