问题
We're using a PUSH/PULL
Scalable Formal Communication Pattern in ZeroMQ. The sender application sends a total of 30,000 messages, 10kB each. There's a lot of data loss and hence we set the following on sender's side:
zmq_socket = context.socket(zmq.PUSH)
zmq_socket.setsockopt(zmq.SNDBUF, 10240)
zmq_socket.setsockopt(zmq.SNDHWM, 1)
zmq_socket.bind("tcp://127.0.0.1:4999")
On receiver's side:
zmq_socket = context.socket(zmq.PULL)
zmqSocket.setReceiveBufferSize(10240);
zmqSocket.setRcvHWM(1);
zmq_socket.connect("tcp://127.0.0.1:4999")
There's still data loss. Not sure how we can avoid packets being dropped silently.
EDIT 1:
Sender code in Python:
context = zmq.Context()
zmq_socket = context.socket(zmq.PUSH)
zmq_socket.setsockopt(zmq.SNDBUF, 10240)
zmq_socket.setsockopt(zmq.SNDHWM, 1)
zmq_socket.bind("tcp://127.0.0.1:4999")
for file_name in list_of_files: # Reads data from a list of files:
while True: # data from a file_name
with open(os.path.join(self.local_base_dir,file_name), 'r') as sf:
socket_data = sf.read(5120)
if socket_data == '':
sf.close()
break # until EoF
ret = zmq_socket.send(socket_data)
if ret == 0:
return True
if ret == -1:
print zmq_errno()
Receiver code in Java:
private ZMQ.Socket zmqSocket = zmqContext.socket(ZMQ.PULL);
zmqSocket.setReceiveBufferSize(10240);
zmqSocket.setRcvHWM(1);
zmqSocket.connect(socketEndpoint);
String message = new String(zmqSocket.recv());
messages.add(message);
回答1:
Remarks on ZeroMQ: based just on what has been posted above so far in an MVCE
Never pre-maturely optimise resources allocations: all quasi-optimisation decisions introduced in code before the code runtime operations meet all functional requirements are misleading. "Optimising" ill-functioning distributed system makes no sense at all. Only once the runtime was tested and is guaranteed to work as specified, some resources may get fine-tuned ex-post, given some positive output rationale of ( newly accrued cost of doing so + positive performance benefits + reduced resource use ) makes sense for doing so. Never vice versa. Never.
Never design a ZeroMQ infrastructure as a consumable ( ref. as seen in the Java MCVE snippet above ). It takes a remarkable time to the O/S to setup and adjust the ZeroMQ world "under the hood", that the zmq.Context( n_IO_threads )
instantiation and all the ad-hoc derived .Socket()
instances have to create ( all inside the O/S scheduler dictated time-frames + under some realistic multi-agent system behaviour dynamics, given the distributed counterparties are introduced into underlying raw socket distributed handshaking & multi-lateral negotiations ), so even if SLOC one-liners or naive school-book examples may look that way, never jump into an ALAP setup of the ZeroMQ infrastructure "right" before reading just one message and dispose the whole cathedral of the ZeroMQ framework ( hidden under the hood ) - be that explicitly or ( the worse ) implicitly - right after .recv()
just one message. Never.
Never ignore documented ZeroMQ features. If it states:
ØMQ does not guarantee that the socket will accept as many as
ZMQ_SNDHWM
messages, and the actual limit may be as much as 60-70% lower depending on the flow of messages on the socket.
one ought never ever attempt to declare .setsockopt( ZMQ_SNDHWM, 1 )
. Never.
Never leave the ZeroMQ infrastructure school-book "naked". There are many fair reasons to design principal try: except: finally;
structures for the ZeroMQ infrastructure usage so as to release all the resources, down to the O/S maintained port#
-s, in every case, incl. the unhandled exceptions. This practice is a serious resource-management must ( not only for re-entrant Factory patterns ) any professional design shall never skip. Never.
GLOBAL_context = zmq.Context( setMoreIOthreadsForPERFORMANCE ) // future PERF
PUSH_socket = GLOBAL_context.socket( zmq.PUSH )
// -------------------------------------------------------------- // .SET
PUSH_socket.setsockopt( zmq.SNDBUF, 123456 ) // ^ HI 1st
PUSH_socket.setsockopt( zmq.SNDHWM, 123456 ) // ^ HI 1st
// PULL_socket.setsockopt( zmq.MAXMSGSIZE, 12345 ) // ~ LOCK !DDoS
PUSH_socket.setsockopt( zmq.AFFINITY, 0 ) // [0] 1st
// -------------------------------------------------------------- // GRACEFUL TERMINATION:
PUSH_socket.setsockopt( zmq.LINGER, 0 ) // ALWAYS
// -------------------------------------------------------------- //
PUSH_socket.bind( "tcp://127.0.0.1:4999" ) // IPC: w/o TCP overheads
// -------------------------------------------------------------- //
...
app logic
...
// -------------------------------------------------------------- // ALWAYS
PUSH_socket.close() // ALWAYS
GLOBAL_context.term() // ALWAYS
Epilogue:
ZeroMQ clearly states, that one either gets the complete message or nothing. So the application design has to take all due care, if in a need to deliver each and every message, as the smart ZeroMQ tools handle many things, but leave this at the designer's responsibility.
Given your comment has explained there is just one receiver for the proposed system, one may rather go into PAIR/PAIR
Scalable Formal Communication Pattern + avoiding all the overheads associated with all the layers of the L3|L2|L1->-L1|L2|L3 multi-stack packet assembly / disassembly & O/S buffers management associated with the tcp://
transport-class and go into ipc://
( if O/S permits ) or vmci://
less complex & thus a lot faster transport-classes, having thus lower ( better ) latency & protocol overheads.
Using PAIR/PAIR
archetype also prevents nasty surprises once a DoS-attack .connect()
-s onto a PUSH_socket side, where ZeroMQ has no means to avoid such step and such game-changing step efficiently devastates your design efforts, as the default behaviour is to start round-robin serving all the "potentially" connected peers ( even after some attacker(s) 's('ve) been forced to disconnect ).
Last but not least, one ought manage such distributed processes to rather always prefer non-blocking mode of .send() / .recv()
calls in professional high-performance signalling / messaging & .poll()
based control tools inside the most critical real-time sections of the data-pumping services used inside the respective application domain.
回答2:
Use the ZMQ_IMMEDIATE
flag to prevent any loss in messages that got queued to pipes which don't have a completed connection.
From http://api.zeromq.org/4-2:zmq-setsockopt#toc21
ZMQ_IMMEDIATE
: Queue messages only to completed connectionsBy default queues will fill on outgoing connections even if the connection has not completed. This can lead to "lost" messages on sockets with round-robin routing (
REQ
,PUSH
,DEALER
). If this option is set to 1, messages shall be queued only to completed connections. This will cause the socket to block if there are no other connections, but will prevent queues from filling on pipes awaiting connection. Option value type int Option value unit boolean Default value0
(false
) Applicable socket types all, only for connection-oriented transports.
zmq_socket.setsockopt(zmq.ZMQ_IMMEDIATE, 1)
来源:https://stackoverflow.com/questions/43771830/zeromq-packets-being-lost-even-after-setting-hwm-and-bufsize