I have read many posts on SO and the web regarding the keywords in my question title and learned a lot from them. Some of the questions I read are related to specific implem
If I may ask one additional thing: I came across in an article somewhere that says that http streaming may also be cached by proxies while websockets are not. what does that mean?
(StackOverflow limits the size of comment responses, so I've had to answer here rather than inline.)
That's a good point. To understand this, think about a traditional HTTP scenario... Imagine a browser opened a web page, so it requests http://example.com, say. The server responds with HTTP that contains the HTML for the page. Then the browser sees that there are resources in the page, so it starts requesting the CSS files, JavaScript files, and images of course. They are all static files that will be the same for all clients requesting them.
Some proxies will cache static resources so that subsequent requests from other clients can get those static resources from the proxy, rather than having to go all the way back to the central web server to get them. This is caching, and it's a great strategy to offload requests and processing from your central services.
So client #1 requests http://example.com/images/logo.gif, say. That request goes through the proxy all the way to the central web server, which serves up logo.gif. As logo.gif passes through the proxy, the proxy will save that image and associate it with the address http://example.com/images/logo.gif.
When client #2 comes along and also requests http://example.com/images/logo.gif, the proxy can return the image and no communication is required back to the web server in the center. This gives a faster response to the end user, which is always great, but it also means that there is less load on the center. That can translate to reduced hardware costs, reduced networking costs, etc. So it's a good thing.
The problem arises when the logo.gif is updated on the web server. The proxy will continue to serve the old image unaware that there is a new image. This leads to a whole thing around expiry so that the proxy will only cache the image for a short time before it "expires" and the next request goes through the proxy to the web server, which then refreshes the proxy's cache. There are also more advanced solutions where a central server can push out to known caches, and so on, and things can get pretty sophisticated.
How does this tie in to your question?
You asked about HTTP streaming where the server is streaming HTTP to a client. But streaming HTTP is just like regular HTTP except you don't stop sending data. If a web server serves an image, it sends HTTP to the client that eventually ends: you've sent the whole image. And if you want to send data, it's exactly the same, but the server just sends for a really long time (like it's a massively gigantic image, say) or even never finishes.
From the proxy's point of view, it cannot distinguish between HTTP for a static resource like an image, or data from HTTP streaming. In both of those cases, the client made a request of the server. The proxy remembered that request and also the response. The next time that request comes in, the proxy serves up the same response.
So if your client made a request for stock prices, say, and got a response, then the next client may make the same request and get the cached data. Probably not what you want! If you request stock prices you want the latest data, right?
So it's a problem.
There are tricks and workarounds to handle problems like that, it is true. Obviously you can get HTTP streaming to work since it's it's in use today. It's all transparent to the end user, but the people who develop and maintain those architectures have to jump through hoops and pay a price. It results in over-complicated architectures, which means more maintenance, more hardware, more complexity, more cost. It also means developers often have to care about something they shouldn't have to when they should just be focussing on the application, GUI, and business logic -- they shouldn't have to be concerned about the underlying communication.