Java RabbitMQ client hangs on resend via thread of producer commit callback after nack due to non-existent exchange

匿名 (未验证) 提交于 2019-12-03 01:08:02

问题:

I am currently experimenting with failure scenarios that might happen when communicating via the message broker RabbitMQ. The goal is to evaluate how such communication can be made more resilient.

In particular, I want to trigger a nack (not-acknowledge) confirm when sending messages in producer-commit mode. To do so, I send a message to a non-existent exchange via Spring AMQP's RabbitTemplate.send. In the callback provided via RabbitTemplate.setConfirmCallback, I then handle ack=false confirms by resending the message to an existing exchange (simulating that I took care of the nack cause).

A sample class and the related test are provided below, the complete sample project can be found in my github repository. I use RabbitMQ 3.6 and Spring Boot/AMQP 2.0.2.

When running the test, the callback is called with ack=false as expected. However, re-sending the message hangs while re-creating a channel (with a timeout exception after 10 minutes). A dump of the call stack and logs are provided below.

A solution to the problem seems to be to send the message in a different thread as proposed here. If you uncomment the line service.runInSeparateThread = true; in the test, things work!

However, I neither truely understand why things (don't) work nor did I read about this practice anywhere except for the above mentioned post. Is this expected behavior or a bug? Can someone explain the details?

Thanks a lot for your advice!

A call stack snapshot:

 "AMQP Connection 127.0.0.1:5672@3968" prio=5 tid=0xe nid=NA waiting  java.lang.Thread.State: WAITING   at java.lang.Object.wait(Object.java:-1)   at com.rabbitmq.utility.BlockingCell.get(BlockingCell.java:73)   at com.rabbitmq.utility.BlockingCell.uninterruptibleGet(BlockingCell.java:120)   at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)   at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:494)   at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:288)   at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:138)   at com.rabbitmq.client.impl.ChannelN.open(ChannelN.java:133)   at com.rabbitmq.client.impl.ChannelManager.createChannel(ChannelManager.java:176)   at com.rabbitmq.client.impl.AMQConnection.createChannel(AMQConnection.java:542)   at org.springframework.amqp.rabbit.connection.SimpleConnection.createChannel(SimpleConnection.java:57)   at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createBareChannel(CachingConnectionFactory.java:1156)   at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.access$200(CachingConnectionFactory.java:1144)   at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.doCreateBareChannel(CachingConnectionFactory.java:585)   at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createBareChannel(CachingConnectionFactory.java:568)   at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getCachedChannelProxy(CachingConnectionFactory.java:538)   at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getChannel(CachingConnectionFactory.java:520)   at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.access$1500(CachingConnectionFactory.java:94)   at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createChannel(CachingConnectionFactory.java:1161)   at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1803)   at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1771)   at org.springframework.amqp.rabbit.core.RabbitTemplate.send(RabbitTemplate.java:859)   ... 

The logs:

... 10:21:24.613 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitAdmin - declaring Exchange 'ExistentExchange' 10:21:24.630 [main] INFO com.example.rabbitmq.ProducerService - sending `initial Message` 10:21:24.648 [main] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - Added listener org.springframework.amqp.rabbit.core.RabbitTemplate$MockitoMock$952329793@562c877a 10:21:24.648 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Added publisher confirm channel: Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1), conn: Proxy@3013909b Shared Rabbit Connection: SimpleConnection@12db3386 [delegate=amqp://guest@127.0.0.1:5672/, localPort= 1341] to map, size now 1 10:21:24.649 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Executing callback RabbitTemplate$$Lambda$175/1694519286 on RabbitMQ Channel: Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1), conn: Proxy@3013909b Shared Rabbit Connection: SimpleConnection@12db3386 [delegate=amqp://guest@127.0.0.1:5672/, localPort= 1341] 10:21:24.649 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Publishing message (Body:'[B@67001148(byte[15])' MessageProperties [headers={}, contentType=application/octet-stream, contentLength=0, deliveryMode=PERSISTENT, priority=0, deliveryTag=0])on exchange [nonExistentExchange], routingKey = [nonExistentQueue] 10:21:24.659 [main] INFO com.example.rabbitmq.ProducerService - done with sending message 10:21:24.675 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) PC:Nack:(close):1 10:21:24.677 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - Sending confirm PendingConfirm [correlationData=null cause=channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'nonExistentExchange' in vhost '/', class-id=60, method-id=40)] 10:21:24.677 [AMQP Connection 127.0.0.1:5672] INFO com.example.rabbitmq.ProducerService - In confirm callback, ack=false, cause=channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'nonExistentExchange' in vhost '/', class-id=60, method-id=40), correlationData=null 10:21:24.677 [AMQP Connection 127.0.0.1:5672] INFO com.example.rabbitmq.ProducerService - sending `resend Message` 10:21:24.678 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) PC:Nack:(close):1 10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - AMQChannel(amqp://guest@127.0.0.1:5672/,1) No listener for seq:1 10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Removed publisher confirm channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) from map, size now 0 10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Removed publisher confirm channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) from map, size now 0 10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - PendingConfirms cleared  

ProducerService:

@Service public class ProducerService {      static final String EXISTENT_EXCHANGE = "ExistentExchange";     private static final String NON_EXISTENT_EXCHANGE = "nonExistentExchange";     private static final String QUEUE_NAME = "nonExistentQueue";     private final Logger logger = LoggerFactory.getLogger(getClass());     private final RabbitTemplate rabbitTemplate;     private final Executor executor = Executors.newCachedThreadPool();     boolean runInSeparateThread = false;      public ProducerService(RabbitTemplate rabbitTemplate) {         this.rabbitTemplate = rabbitTemplate;         rabbitTemplate.setConfirmCallback(this::confirmCallback);     }      private void confirmCallback(CorrelationData correlationData, boolean ack, String cause) {         logger.info("In confirm callback, ack={}, cause={}, correlationData={}", ack, cause, correlationData);         if (!ack) {             if (runInSeparateThread) {                 executor.execute(() -> sendMessage("resend Message", EXISTENT_EXCHANGE));             } else {                 sendMessage("resend Message", EXISTENT_EXCHANGE);             }         } else {             logger.info("sending was acknowledged");         }     }      public void produceMessage() {         sendMessage("initial Message", NON_EXISTENT_EXCHANGE);     }      private void sendMessage(String messageBody, String exchangeName) {         logger.info("sending `{}`", messageBody);         rabbitTemplate.send(exchangeName, QUEUE_NAME, new Message(messageBody.getBytes(), new MessageProperties()));         logger.info("done with sending message");     }  } 

ProducerServiceTest:

@RunWith(SpringRunner.class) @ContextConfiguration(classes = {RabbitAutoConfiguration.class, ProducerService.class}) @DirtiesContext public class ProducerServiceTest {      @Autowired     private ProducerService service;     @SpyBean     private RabbitTemplate rabbitTemplate;     @Autowired     private AmqpAdmin amqpAdmin;     @Autowired     private CachingConnectionFactory cachingConnectionFactory;      @Before     public void setup() {         cachingConnectionFactory.setPublisherConfirms(true);         amqpAdmin.declareExchange(new DirectExchange(ProducerService.EXISTENT_EXCHANGE));     }      @After     public void cleanup() {         amqpAdmin.deleteExchange(ProducerService.EXISTENT_EXCHANGE);     }      @Test     public void sendMessageToNonexistentExchange() throws InterruptedException {         final CountDownLatch sentMessagesLatch = new CountDownLatch(2);         final List<Message> sentMessages = new ArrayList<>();         doAnswer(invocation -> {             invocation.callRealMethod();             sentMessages.add(invocation.getArgument(2));             sentMessagesLatch.countDown();             return null;         }).when(rabbitTemplate).send(anyString(), anyString(), any(Message.class));  //        service.runInSeparateThread = true;         service.produceMessage();         sentMessagesLatch.await();          List<String> messageBodies = sentMessages.stream().map(message -> new String(message.getBody())).collect(toList());         assertThat(messageBodies, equalTo(Arrays.asList("initial Message", "resend Message")));     }  } 

回答1:

It could be considered a bug, I suppose, but it's an artifact of the way we cache channels to improve performance. The problem is that attempting to publish on a channel on the same thread that's delivering an ack for the same channel causes a deadlock in the client library.

We have an open issue to look into a solution (for a different reason); we just haven't gotten around to it. AFAIK, you are only the second user to hit this in more than 6 years since we added support for confirms and returns.

EDIT

Actually, this is a different situation; it's not reusing the channel since the channel is closed. It is trying to create a new channel and that is what is deadlocked. I don't see how we (Spring AMQP) can do anything; it's a limitation of the java client; you cannot perform operations on the ack thread.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!