Scaling a decoupled realtime server alongside a standard webserver

Say I have a typical web server that serves standard HTML pages to clients, and a websocket server running alongside it used for realtime updates (chat, notifications, etc.).

My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.

My concern is, if I want to scale things up a bit, and add another realtime server, it seems my only options are:

Have the main server keep track of which realtime server the client is connected to. When that client receives a notification/chat message, the main server forwards that message along to only the realtime server the client is connected to. The downside here is code complexity, as the main server has to do some extra book keeping.
Or instead have the main server simply pass that message along to every realtime server; only the server the client is connected to would actually do anything with it. This would result in a number of wasted messages being passed around.

Am I missing another option here? I'm just trying to make sure I don't go too far down one of these paths and realize I'm doing things totally wrong.

If the scenario is

a) The main web server raises a message upon an action (let's say a record is inserted) b ) He notifies the appropriate real-time server

you could decouple these two steps by using an intermediate pub/sub architecture that forwards the messages to the indended recipient.

An implementation would be

1) You have a redis pub-sub channel where upon a client connecting to a real-time socket, you start listening in that channel

2) When the main app wants to notify a user via the real-time server, it pushes to the channel a message, the real-time server get's it and forwards it to the intended user.

This way, you decouple the realtime notification from the main app and you don't have to keep track of where the user is.

The problem you are describing is the common "message backplane" used for example in SignalR, also related to the "fanout message exchange" in message architectures. When having a backplane or doing fanout, every message is forwarded to every message node server, so clients can connect to any server and get the message. This approach is a reasonable pain when you have to support both long polling and websockets. However, as you noticed, it is a waste of traffic and resources.

You need to use a message infrastructure with intelligent routing, like RabbitMQ. Take a look to topic and header exchange : https://www.rabbitmq.com/tutorials/amqp-concepts.html

How Topic Exchanges Route Messages

RabbitMQ for Windows: Exchange Types

There are tons of different queuing frameworks. Pick the one you like, but ensure you can have more exchange modes than just direct or fanout ;) At the end, a WebSocket is just and endpoint to connect to a message infrastructure. So if you want to scale out, it boils down to the backend you have :)

For just a few realtime servers, you could conceivably just keep a list of them in the main server and just go through them round-robin.

Another approach is to use a load balancer.

Basically, you'll have one dedicated node to receive the requests from the main server, and then have that load-balancer node take care of choosing which websocket/realtime server to forward the request to.

Of course, this just shifts the code complexity from the main server to a new component, but conceptually I think it's better and more decoupled.

Changed the answer because a reply indicated that the "main" and "realtime" servers are alraady load-balanced clusters and not individual hosts.

The central scalability question seems to be:

My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.

Emphasis on the word "related". Assume you have 10 "main" servers and 50 "realtime" servers, and an event occurs on main server #5: which of the websockets would be considered related to this event?

Worst case is that any event on any "main" server would need to propagate to all websockets. That's a O(N^2) complexity, which counts as a severe scalability impairment.

This O(N^2) complexity can only be prevented if you can group the related connections in groups that don't grow with the cluster size or total nr. of connections. Grouping requires state memory to store to which group(s) does a connection belong.

Remember that there's 3 ways to store state:

global memory (memcached / redis / DB, ...)
sticky routing (load balancer configuration)
client memory (cookies, browser local storage, link/redirect URLs)

Where option 3 counts as the most scalable one because it omits a central state storage.

For passing the messages from "main" to the "realtime" servers, that traffic should by definition be much smaller than the traffic towards the clients. There's also efficient frameworks to push pub/sub traffic.

来源：https://stackoverflow.com/questions/30114325/scaling-a-decoupled-realtime-server-alongside-a-standard-webserver

标签

python

websocket

real-time

scalability