Stop browser to make HTTP requests for images that should stay cached - mod_expires

我的未来我决定 提交于 2019-11-27 17:26:49
Oliver Kurmis

You were using the wrong tool for analysing the requests.

I'd recommend the really useful Firefox addon Live HTTP headers so you can see what is really going on on the network.

And just to be sure, you can ssh/putty your server and do something like

tail -f /var/log/apache2/access.log
Jason Buberel

The behavior you are seeing is the intended (see RFC7234 for more details), specified behavior:

All modern browsers will send HTTP requests to the server for every page element displayed, regardless of cache status. This was a design decision made at the request of web services (especially advertising networks) to ensure that HTTP servers were able to maintain records of every display of every element.

If the browsers did not make these requests, the server would never be notified that an image had been displayed to the user. For advertising networks, this would be catastrophic. Early on, advertising networks 'hacked' their way around this by serving the same ad image using randomly generated names (ex: 'coke_ad_1_98719283719283.gif'). However, for ISPs this practice caused a huge increase in data transfers, because every one of their users was re-downloading these identical ad images, bypassing any caching/proxy servers their ISP was operating.

So a truce was reached: Browsers would always send HTTP requests, even for un-expired cached elements. Servers would respond with HTTP 304 status codes ("not modified"). This allows the servers to record the fact that the image was displayed to the client. As a result, advertising networks generally stopped using randomized image names to bypass network cache servers.

This gave the ad networks what they wanted - a record of every image displayed - and it gave ISPs what they wanted - cache-able images and static content.

That is why there isn't much you can do to prevent browsers from sending HTTP requests for cached page elements.

But if you look at other available client-side solutions that came along with html5, there is a scope to prevent resource loading

  1. Cache Manifest (in spite of its gotchas)
  2. IndexedDB (nice asynchronous features, allows blob storage)
  3. Local Storage (not async)

There's a difference between "reloading" and "refreshing". Just navigating to a page with back and forward buttons usually doesn't initiate new HTTP requests, but specifically hitting F5 to "refresh" the page will cause the browser to double check its cache. This is browser dependent but seems to be the norm for FF and Chrome (i.e. the browsers that have the ability to easily watch their network traffic.) Hitting F6, enter should focus the URL address bar and then "go" to it, which should reload the page but not double check the assets on the page.

Update: clarification of back and forward navigating behavior. It's called "Back Forward Cache" or BFCache in browsers. When you navigate with back/forward buttons the intent is to show you exactly as the page was when you saw it in your own timeline. No server requests are made when using back and forward, even if a server cache header says that a particular item expired.

If you see (200 OK BFCache) in your developer network panel, then the server was never hit - even to ask if-modified-since.

http://www.softwareishard.com/blog/firebug/firebug-tip-what-the-heck-is-bfcache/

If I force a refresh using F5 or F5 + Ctrl, a request is send. However if I close the browser and enter the url again then NO reqeust is send. The way I tested if a request is send or not was by using breakpoints on begin request on the server even when a request is not send it still shows up in Firebug as having done a 7 ms wait, so beware of this.

What you are describing here does not reflect my experience. If content is served with a no-store directive or you do an explicit refresh, then yes, I'd expect it to go back to the origin server otherwise it should be cached across browser restarts (assuming it is allowed to, and can write a cache file).

Looking at your waterfalls in a bit more detail (which is tricky because they are a bit small & blurry) the browser appears to be doing exactly what it should - it has entries for the images - but these are just loading from the local cache not from the origin server - check the 'Date' header in the response (why do you think it's taking milliseconds instead of seconds?). That's why they are coloured differently.

dev-vb

After myself spending considerable time looking for a reasonable answer, I found the below link most useful and it does answer the question asked here.

https://webmasters.stackexchange.com/questions/25342/headers-to-prevent-304-if-modified-since-head-requests

If it is a matter of life or death (If you want to optimise page loading this way or if you want to reduce the load on the server as much as possible no matter what), then there IS a workaround.

Use HTML5 local storage to cache images after they were requested for the first time.

  • [+] You can prevent browser from sending HTTP requests, which in 99% would return 304 (Not Modified), no matter how hard user tries (F5, ctrl+F5, simply revisiting page, etc.)

  • [-] You have to put some extra efforts in javascript support for this.

  • [-] Images are stored in base64 (we cannot store binary data), thats why they are decoded each time at client side. Which is usually pretty fast and not big deal, but it is still some extra cpu usage at client side and should be kept in mind.

  • [-] Local storage is limited. You can aim at using ~5mb of data per domain (Note: base64 adds ~30% to original size of image).

  • [?] Supported by majority of browsers. http://caniuse.com/#search=localstorage

Example

Test

What you are seeing in Chrome is not a record of the actual HTTP requests - it's a record of asset requests. Chrome does this to show you that an asset is actually being requested by the page. However, this view does not really actually indicate if the request is being made. If an asset is cached, Chrome will never actually create the underlying HTTP request.

You can also confirm this by hovering over the purple segments in the timeline. Cached resources will have a (from cache) in the tooltip.

In order to see the actual HTTP requests, you need to look on a lower level. In some browsers this can be done with a plugin (like Live HTTP Headers).

In reality though, to verify the requests are not actually being made you need to check your server logs or use a debugging proxy like Charles or Fiddler. This will work on an HTTP level to make sure the requests are not actually happening.

Cache Validation and the 304 response

There are a number of situations in which Internet Explorer needs to check whether a cached entry is valid:

  • The cached entry has no expiration date and the content is being accessed for the first time in a browser session

  • The cached entry has an expiration date but it has expired

  • The user has requested a page update by clicking the Refresh button or pressing F5

If the cached entry has a last modification date, IE sends it in the If-Modified-Since header of a GET request message:

GET /images/logo.gif HTTP/1.1
Accept: */*
Referer: http://www.google.com/
Accept-Encoding: gzip, deflate
If-Modified-Since: Thu, 23 Sep 2004 17:42:04 GMT
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)
Host: www.google.com

The server checks the If-Modified-Since header and responds accordingly. If the content has not been changed since the date/time specified, it replies with a status code of 304 and a response message that just contains headers:

HTTP/1.1 304 Not Modified
Content-Type: text/html
Server: GWS/2.1
Content-Length: 0
Date: Thu, 04 Oct 2004 12:00:00 GMT

The response can be quickly downloaded because it contains no content and causes IE to read the data it requires from the cache. In effect, it is like a redirection to the local browser cache.

If the requested object has actually changed since the date/time in the If-Modified-Since header, the server responses with a status code of 200 and supplies the modified version of the resource.

sandeepkunkunuru

This question has a better answer here at webmasters stack-exchange site.

More information, which is also cited in the above link, is on httpwatch

According to the article:

There are a number of situations in which Internet Explorer needs to check whether a cached entry is valid:

  • The cached entry has no expiration date and the content is being accessed for the first time in a browser session
  • The cached entry has an expiration date but it has expired
  • The user has requested a page update by clicking the Refresh button or pressing F5

    enter code here

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!