How to find when a web page was last updated

后端 未结 6 1334
你的背包
你的背包 2020-12-22 18:55

Is there a way to find out how much time has passed since a web page was changed?

For example, I have a page hosted at: www.mywebsitenotupdated.com

相关标签:
6条回答
  • 2020-12-22 19:35

    For checking the Last Modified header, you can use httpie (docs).

    Installation

    pip install httpie --user
    

    Usage

    $ http -h https://martin-thoma.com/author/martin-thoma/ | grep 'Last-Modified\|Date'
    Date: Fri, 06 Jan 2017 10:06:43 GMT
    Last-Modified: Fri, 06 Jan 2017 07:42:34 GMT
    

    The Date is important as this reports the server time, not your local time. Also, not every server sends Last-Modified (e.g. superuser seems not to do it).

    0 讨论(0)
  • 2020-12-22 19:39

    This is a Pythonic way to do it:

    import httplib
    import yaml
    c = httplib.HTTPConnection(address)
    c.request('GET', url_path)
    r = c.getresponse()
    # get the date into a datetime object
    lmd = r.getheader('last-modified')
    if lmd != None:
       cur_data = { url: datetime.strptime(lmd, '%a, %d %b %Y %H:%M:%S %Z') }
    else:
       print "Hmmm, no last-modified data was returned from the URL."
       print "Returned header:"
       print yaml.dump(dict(r.getheaders()), default_flow_style=False)
    

    The rest of the script includes an example of archiving a page and checking for changes against the new version, and alerting someone by email.

    0 讨论(0)
  • 2020-12-22 19:41

    No, you cannot know when a page was last updated or last changed or uploaded to a server (which might, depending on interpretation, be three different things) just by accessing the page.

    A server may, and should (according to the HTTP 1.1 protocol), send a Last-Modified header, which you can find out in several ways, e.g. using Rex Swain’s HTTP Viewer. However, according to the protocol, this is just

    “the date and time at which the origin server believes the variant was last modified”.

    And the protocol realistically adds:

    “The exact meaning of this header field depends on the implementation of the origin server and the nature of the original resource. For files, it may be just the file system last-modified time. For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts. For database gateways, it may be the last-update time stamp of the record. For virtual objects, it may be the last time the internal state changed.”

    In practice, web pages are very often dynamically created from a Content Management System or otherwise, and in such cases, the Last-Modified header typically shows a data stamp of creating the response, which is normally very close to the time of the request. This means that the header is practically useless in such cases.

    Even in the case of a “static” page (the server simply picks up a file matching the request and sends it), the Last-Modified date stamp normally indicates just the last write access to the file on the server. This might relate to a time when the file was restored from a backup copy, or a time when the file was edited on the server without making any change to the content, or a time when it was uploaded onto the server, possibly replacing an older identical copy. In these cases, assuming that the time stamp is technically correct, it indicates a time after which the page has not been changed (but not necessarily the time of last change).

    0 讨论(0)
  • 2020-12-22 19:42

    For me it was the

    article:modified_time
    

    in the page source.

    0 讨论(0)
  • 2020-12-22 19:44

    There is another way to find the page update which could be useful for some occasions (if works:).

    If the page has been indexed by Google, or by Wayback Machine you can try to find out what date(s) was(were) saved by them (these methods do not work for any page, and have some limitations, which are extensively investigated in this webmasters.stackexchange question's answers. But in many cases they can help you to find out the page update date(s):

    1. Google way: Go by link https://www.google.com.ua/search?q=site%3Awww.example.com&biw=1855&bih=916&source=lnt&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2000%2Ccd_max%3A&tbm=
      • You can change text in search field by any page URL you want.
      • For example, the current stackoverflow question page search gives us as a result May 14, 2014 - which is the question creation date:
    2. Wayback machine way: Go by link https://web.archive.org/web/*/www.example.com
      • for this stackoverflow page wayback machine gives us more results: Saved 6 times between June 7, 2014 and November 23, 2016., and you can view all saved copies for each date
    0 讨论(0)
  • 2020-12-22 19:45

    Open your browsers console(?) and enter the following:

    javascript:alert(document.lastModified)
    
    0 讨论(0)
提交回复
热议问题