Can scraping be applied to this page which is actively recalculating?

问题

I would like to grab satellite positions from the page(s) below, but I'm not sure if scraping is appropriate because the page appears to be updating itself every second using some internal code (it keeps updating after I disconnect from the internet). Background information can be found in my question at Space Stackexchange: A nicer way to download the positions of the Orbcomm-2 satellites.

I need a "snapshot" of four items simultaneously:

UTC time
latitude
longitude
altitude

Right now I use screen shots and manual typing. Since these values are being updated by the page - is conventional web-scraping going to work here? I found a "screen-scraping" tag, should I try to learn about that instead?

I'm looking for the simplest solution to get those four values, I wonder if I can just use urllib or urllib2 and avoid installing something new?

example page: http://www.satview.org/?sat_id=41186U I need to do 41179U through 41189U (the eleven Orbcomm-2 satellites that SpaceX just put in orbit)

回答1:

One option would be to fire up a real browser and continuously poll the position in an endless loop:

import time
from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://www.satview.org/?sat_id=41186U")


while True:
    location = driver.find_element_by_css_selector("#sat_latlon .texto_track2").text
    latitude, longitude = location.split("\n")[:2]

    print(latitude, longitude)

    time.sleep(1)

Sample output:

(u'-16.57', u'66.63')
(u'-16.61', u'66.67')
...

Here we are using selenium and Firefox - there are multiple drivers for different browsers including headless, like PhantomJS.

回答2:

no need to scrape. Just look at the source html of that page and copy/paste the javascript code. None of the positions are fetched remotely...they're all calculated on the fly in the page. So just grab the code and run it yourself!

来源：https://stackoverflow.com/questions/34459285/can-scraping-be-applied-to-this-page-which-is-actively-recalculating

标签

python

web-scraping

screen-scraping