how to tell when a HTTP web page has changed when it is of type html/text?

狂风中的少年 提交于 2020-01-14 06:10:06

问题


I'm trying to work out the algorithm to tell if non-binary files on the web have changed or not. I was going to go with:

  • LastModified datetime from header, and then if these aren't present fallback to
  • ContentLength from header

I'm finding however that for alot of websites the LastModified for the HTML pages are actually just using the current DateTime, hence the approach doesn't work (i.e. would lead to an indication that the page is always changing) I think...?

What would be a good algorithm then? How about?

IF response.ContentType.StartsWith("text/html")  <== or should this just be "text"
  THEN: 
    Check based on comparing text content before & after
  ELSE: 
    IF LastModified dates are OK 
      Compare based on LastModified dates
    ELSE 
      Compare based on ContentLength

thanks


回答1:


Sending the request, specify If-Modified-Since http header. Then it's up to the server to reply either with new html or with 304 - content not changed.




回答2:


The ETag response header is a good indicator of this, if present. Use requests with If-None-Match (or just HEAD requests) to see.



来源:https://stackoverflow.com/questions/1781579/how-to-tell-when-a-http-web-page-has-changed-when-it-is-of-type-html-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!