How does pubsub know how many messages I published at a point in time?

会有一股神秘感。 提交于 2020-05-17 07:41:26

问题


Code for publishing the messages here:

async function publishMessage(topicName) {
  console.log(`[${new Date().toISOString()}] publishing messages`);
  const pubsub = new PubSub({ projectId: PUBSUB_PROJECT_ID });
  const topic = pubsub.topic(topicName, {
    batching: {
      maxMessages: 10,
      maxMilliseconds: 10 * 1000,
    },
  });

  const n = 5;
  const dataBufs: Buffer[] = [];
  for (let i = 0; i < n; i++) {
    const data = `message payload ${i}`;
    const dataBuffer = Buffer.from(data);
    dataBufs.push(dataBuffer);
  }

  const results = await Promise.all(
    dataBufs.map((dataBuf, idx) =>
      topic.publish(dataBuf).then((messageId) => {
        console.log(`[${new Date().toISOString()}] Message ${messageId} published. index: ${idx}`);
        return messageId;
      })
    )
  );
  console.log('results:', results.toString());
}

As you can see, I am going to publish 5 messages. The time to publish is await Promise.all(...), I mean, for users, We can say send messages at this moment, but for internal of pubsub library maybe not. I set maxMessages to 10, so pubsub will wait for 10 seconds(maxMilliseconds), then publish these 5 messages.

The exuection result meets my expectations:

[2020-05-05T09:53:32.078Z] publishing messages
[2020-05-05T09:53:42.209Z] Message 36854 published. index: 0
[2020-05-05T09:53:42.209Z] Message 36855 published. index: 1
[2020-05-05T09:53:42.209Z] Message 36856 published. index: 2
[2020-05-05T09:53:42.209Z] Message 36857 published. index: 3
[2020-05-05T09:53:42.209Z] Message 36858 published. index: 4
results: 36854,36855,36856,36857,36858

In fact, I think topic.publish does not directly call the remote pubsub service, but pushes the message into the memory queue. And there is a window time to calculate the count of the messages, maybe in a tick or something like:

// internal logic of @google/pubsub library
setTimeout(() => {
  // if user messages to be published gte maxMessages, then, publish them immediately
  if(getLength(messageQueue) >= maxMessages) {
    callRemotePubsubService(messageQueue)
  }
}, /* window time = */ 100);

Or using setImmediate(), process.nextTick()?


回答1:


Note that the conditions for sending a message to the service is an OR not an AND. In other words, if either maxMessages messages are waiting to be sent OR maxMilliseconds has passed since the library received the first outstanding message, it will send the outstanding messages to the server.

The source code for the client library is available, so you can see exactly what it does. The library has a queue that it uses to track messages that haven't been sent yet. When a message is added, if the queue is now full (based on the batching settings), then it immediately calls publish. When the first message is added, it uses setTimeout to schedule a call that ultimately calls publish on the service. The publisher client has an instance of the queue to which it adds messages when publish is called.



来源:https://stackoverflow.com/questions/61610441/how-does-pubsub-know-how-many-messages-i-published-at-a-point-in-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!