问题
I would like to grab satellite positions from the page(s) below, but I'm not sure if scraping is appropriate because the page appears to be updating itself every second using some internal code (it keeps updating after I disconnect from the internet). Background information can be found in my question at Space Stackexchange: A nicer way to download the positions of the Orbcomm-2 satellites.
I need a "snapshot" of four items simultaneously:
- UTC time
- latitude
- longitude
- altitude
Right now I use screen shots and manual typing. Since these values are being updated by the page - is conventional web-scraping going to work here? I found a "screen-scraping" tag, should I try to learn about that instead?
I'm looking for the simplest solution to get those four values, I wonder if I can just use urllib
or urllib2
and avoid installing something new?
example page: http://www.satview.org/?sat_id=41186U I need to do 41179U through 41189U (the eleven Orbcomm-2 satellites that SpaceX just put in orbit)
回答1:
One option would be to fire up a real browser and continuously poll the position in an endless loop:
import time
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.satview.org/?sat_id=41186U")
while True:
location = driver.find_element_by_css_selector("#sat_latlon .texto_track2").text
latitude, longitude = location.split("\n")[:2]
print(latitude, longitude)
time.sleep(1)
Sample output:
(u'-16.57', u'66.63')
(u'-16.61', u'66.67')
...
Here we are using selenium and Firefox - there are multiple drivers for different browsers including headless, like PhantomJS.
回答2:
no need to scrape. Just look at the source html of that page and copy/paste the javascript code. None of the positions are fetched remotely...they're all calculated on the fly in the page. So just grab the code and run it yourself!
来源:https://stackoverflow.com/questions/34459285/can-scraping-be-applied-to-this-page-which-is-actively-recalculating