Handling long running tasks in pika / RabbitMQ

后端 未结 6 561
没有蜡笔的小新
没有蜡笔的小新 2020-12-04 09:26

We\'re trying to set up a basic directed queue system where a producer will generate several tasks and one or more consumers will grab a task at a time, process it, and ackn

相关标签:
6条回答
  • 2020-12-04 09:48
    1. You can periodic call connection.process_data_events() in your long_running_task(connection), this function will send heartbeat to server when it is been called, and keep the pika client away from close.
    2. Set the heartbeat value greater than call connection.process_data_events() period in your pika BlockingConnection.
    0 讨论(0)
  • 2020-12-04 09:58

    Please don't disable heartbeats!

    As of Pika 0.12.0, please use the technique described in this example code to run your long-running task on a separate thread and then acknowledge the message from that thread.


    NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

    0 讨论(0)
  • 2020-12-04 09:58

    Don't disable heartbeat.
    The best solution is to run the task in a separate thread and , set the prefetch_count to 1 so that the consumer only gets 1 unacknowledged message using something like this channel.basic_qos(prefetch_count=1)

    • https://github.com/pika/pika/issues/753#issuecomment-318124510
    • https://www.rabbitmq.com/consumer-prefetch.html
    0 讨论(0)
  • 2020-12-04 09:59

    I encounter the same problem you had.
    My solution is:

    1. ture off the heartbeat on the server side
    2. evaluate the maximum time the task can possible take
    3. set the client heartbeat timeout to the time got from step2

    Why this?

    As i test with the following cases:

    case one
    1. server heartbeat turn on, 1800s
    2. client unset

    I still get error when task running for a very long time -- >1800

    case two
    1. turn off server heartbeat
    2. turn off client heartbeat

    There is no error on client side, except one problem--when the client crashes(my os restart on some faults), the tcp connection still can be seen at the Rabbitmq Management plugin. And it is confusing.

    case three
    1. turn off server heartbeat
    2. turn on client heartbeat, set it to the foresee maximum run time

    In this case, i can dynamic change every heatbeat on indivitual client. In fact, i set heartbeat on the machines crashed frequently.Moreover, i can see offline machine through the Rabbitmq Manangement plugin.

    Environment

    OS: centos x86_64
    pika: 0.9.13
    rabbitmq: 3.3.1

    0 讨论(0)
  • 2020-12-04 10:05

    For now, your best bet is to turn off heartbeats, this will keep RabbitMQ from closing the connection if you're blocking for too long. I am experimenting with pika's core connection management and IO loop running in a background thread but it's not stable enough to release.

    In pika v1.1.0 this is ConnectionParameters(heartbeat=0)

    0 讨论(0)
  • 2020-12-04 10:06

    You can also set up a new thread, and process the message in this new thread, and call .sleep on the connection while this thread is alive to prevent missing heartbeats. Here is a sample code block taken from @gmr in github, and a link to the issue for future reference.

    import re
    import json
    import threading
    
    from google.cloud import bigquery
    import pandas as pd
    import pika
    from unidecode import unidecode
    
    def process_export(url, tablename):
        df = pd.read_csv(csvURL, encoding="utf-8")
        print("read in the csv")
        columns = list(df)
        ascii_only_name = [unidecode(name) for name in columns]
        cleaned_column_names = [re.sub("[^a-zA-Z0-9_ ]", "", name) for name in ascii_only_name]
        underscored_names = [name.replace(" ", "_") for name in cleaned_column_names]
        valid_gbq_tablename = "test." + tablename
        df.columns = underscored_names
    
        # try:
        df.to_gbq(valid_gbq_tablename, "some_project", if_exists="append", verbose=True, chunksize=10000)
        # print("Finished Exporting")
        # except Exception as error:
        #     print("unable to export due to: ")
        #     print(error)
        #     print()
    
    def data_handler(channel, method, properties, body):
        body = json.loads(body)
    
        thread = threading.Thread(target=process_export, args=(body["csvURL"], body["tablename"]))
        thread.start()
        while thread.is_alive():  # Loop while the thread is processing
            channel._connection.sleep(1.0)
        print('Back from thread')
        channel.basic_ack(delivery_tag=method.delivery_tag)
    
    
    def main():
        params = pika.ConnectionParameters(host='localhost', heartbeat=60)
        connection = pika.BlockingConnection(params)
        channel = connection.channel()
        channel.queue_declare(queue="some_queue", durable=True)
        channel.basic_qos(prefetch_count=1)
        channel.basic_consume(data_handler, queue="some_queue")
        try:
            channel.start_consuming()
        except KeyboardInterrupt:
            channel.stop_consuming()
        channel.close()
    
    if __name__ == '__main__':
        main()
    

    The link: https://github.com/pika/pika/issues/930#issuecomment-360333837

    0 讨论(0)
提交回复
热议问题