Cassandra database session reuse in AWS Lambda (python)

问题

I am trying to reuse a Cassandra cluster session for subsequent AWS Lambda function calls. I've successfully implemented it in Java, but reusing the session in the python gets the lambda invocation timed out (the first call which actually performs the initialization is ok).

From the CloudWatch logs I can see I get a Heartbeat failed for connection. It looks to me that the session is not able to communicate while idle, and that it gets in an inconsistent state in which is not able to resume the connection. Trying longer or shorter idle_heartbeat_interval than the function timeout in fact doesn't have any impact on the outcome.

Here is the structure of my lambda function (omitted some code for brevity):

import logging
from cassandra_client import CassandraClient

logger = logging.getLogger()
logger.setLevel(logging.INFO)

#   State of the initialization phase
flag = False

#   Cassandra instance
cassandra = None

def handle_request(event, context):

    global flag, logger, cassandra

    logger.info('Function started. Flag: %s' % (str(flag), ))

    if not flag:
        logger.info('Initialization...')
        try:
            cassandra = CassandraClient()

            #   ...

            flag = True

        except Exception as e:
            logger.error('Cannot perform initialization: '+e.message)
            exit(-1)

    #   Process the request ...
    return 'OK'

Just for completeness, that's how I create the connection with the cluster:

def _connect(self, seed_nodes=default_seed_nodes, port=default_port):
    self.cluster = Cluster(seed_nodes, port=port)
    self.metadata = self.cluster.metadata
    self.session = self.cluster.connect()
    # ...

Is there some driver configuration detail, python lambda behavior I am not aware of that prevents the session to be reused?

I do think AWS Lambda is a really great tool, but not having much control over the execution can be somehow confusing for certain aspects. Any suggestion is really appreciated, thanks.

回答1:

I think I can state that this issue is cause by a different behavior of lambda when using the Python execution environment w.r.t. Java.

I had time to set up a simple lambda function, implemented both in Java ad Python. The function simply spawns a thread which prints the current time in a while loop. The question was: will the thread in the Java implementation continue printing even after the lambda function has returned, and conversely, will the Python thread stop instead? The answer is yes in both cases: the java thread continues printing till the timeout configured, while python will stop as soon as the lambda function returns.

The CloudWatch log for the Java version confirms that:

09:55:21 START RequestId: b70e732b-e476-11e6-b2bb-e11a0dd9b311 Version: $LATEST
09:55:21 Function started: 1485510921351
09:55:21 Pre function call: 1485510921351
09:55:21 Background function: 1485510921352
09:55:21 Background function: 1485510921452
09:55:21 Background function: 1485510921552
09:55:21 Background function: 1485510921652
09:55:21 Background function: 1485510921752
09:55:21 Post function call: 1485510921852
09:55:21 Background function: 1485510921853
09:55:21 END RequestId: b70e732b-e476-11e6-b2bb-e11a0dd9b311
09:55:21 REPORT RequestId: b70e732b-e476-11e6-b2bb-e11a0dd9b311 Duration: 523.74 ms Billed Duration: 600 ms Memory Size: 256 MB Max Memory Used: 31 MB
09:55:21 Background function: 1485510921953
09:55:22 Background function: 1485510922053
...

While in the Python version:

09:01:04 START RequestId: 21ccc71e-e46f-11e6-926b-6b46f85c9c69 Version: $LATEST
09:01:04 Function started: 2017-01-27 09:01:04.189819
09:01:04 Pre function call: 2017-01-27 09:01:04.189849
09:01:04 background_call function: 2017-01-27 09:01:04.194368
09:01:04 background_call function: 2017-01-27 09:01:04.294617
09:01:04 background_call function: 2017-01-27 09:01:04.394843
09:01:04 background_call function: 2017-01-27 09:01:04.495100
09:01:04 background_call function: 2017-01-27 09:01:04.595349
09:01:04 Post function call: 2017-01-27 09:01:04.690483
09:01:04 END RequestId: 21ccc71e-e46f-11e6-926b-6b46f85c9c69
09:01:04 REPORT RequestId: 21ccc71e-e46f-11e6-926b-6b46f85c9c69 Duration: 500.99 ms Billed Duration: 600 ms Memory Size: 128 MB Max Memory Used: 8 MB

Here's the code of the two functions:

Python

import thread
import datetime
import time


def background_call():
    while True:
        print 'background_call function: %s' % (datetime.datetime.now(), )
        time.sleep(0.1)

def lambda_handler(event, context):
    print 'Function started: %s' % (datetime.datetime.now(), )

    print 'Pre function call: %s' % (datetime.datetime.now(), )
    thread.start_new_thread(background_call, ())
    time.sleep(0.5)
    print 'Post function call: %s' % (datetime.datetime.now(), )

    return 'Needs more cowbell!'

Java

import com.amazonaws.services.lambda.runtime.*;


public class BackgroundTest implements RequestHandler<RequestClass, ResponseClass> {

    public static void main( String[] args )
    {
        System.out.println( "Hello World!" );
    }

    public ResponseClass handleRequest(RequestClass requestClass, Context context) {
        System.out.println("Function started: "+System.currentTimeMillis());
        System.out.println("Pre function call: "+System.currentTimeMillis());
        Runnable r = new Runnable() {
            public void run() {
                while(true){
                    try {
                        System.out.println("Background function: "+System.currentTimeMillis());
                        Thread.sleep(100);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        };
        Thread t = new Thread(r);
        t.start();
        try {
            Thread.sleep(500);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        System.out.println("Post function call: "+System.currentTimeMillis());
        return new ResponseClass("Needs more cowbell!");
    }
}

回答2:

There's a similar issue in the cassandra-driver FAQs, where WSGI applications won't work with a global connection pool:

Depending on your application process model, it may be forking after driver Session is created. Most IO reactors do not handle this, and problems will manifest as timeouts. [Here][1]

This at least got me on the right track to check the available connection classes: it turns out that cassandra.io.twistedreactor.TwistedConnection works pretty well on AWS Lambda.

All in all the code looks something like this:

from cassandra.cluster import Cluster
from cassandra.io.twistedreactor import TwistedConnection
import time


SESSION = Cluster([...], connection_class=TwistedConnection).connect()


def run(event, context):
    t0 = time.time()
    x = list(SESSION.execute('SELECT * FROM keyspace.table'))  # Ensure query actually evaluated
    print('took', time.time() - t0)

You will need to install twisted in your venv though.

I ran this overnight on 1-min crontab and have only seen a few connection errors (up to 2 in an hour), so overall quite happy with the solution.

Also I haven't tested eventlet and gevent based connections, because I can't have them monkey patching my applications, and I also didn't feel like compiling libev to use on lambda. Someone else might want to try though.

Don't forget to http://datastax.github.io/python-driver/faq.html#why-do-connections-or-io-operations-timeout-in-my-wsgi-application

来源：https://stackoverflow.com/questions/41850876/cassandra-database-session-reuse-in-aws-lambda-python

标签

python

amazon-web-services

cassandra

connection-pooling

aws-lambda