Speeding up matrix-vector multiplication and exponentiation in Python, possibly by calling C/C++

前端 未结 2 899
清歌不尽
清歌不尽 2021-01-12 16:21

I am currently working on a machine learning project where - given a data matrix Z and a vector rho - I have to compute the value and slope of the

2条回答
  •  感动是毒
    2021-01-12 16:56

    Libraries of the BLAS family are already highly tuned for best performance. So no effort to link to some C/C++ code is likely to give you any benefits. You could however try various BLAS implementations, since there are quite a few of them around, including some specially tuned to some CPUs.

    The other thing that comes to my mind is to use a library like theano (or Google's tensorflow) that is able to represent the entire computational graph (all of the operations in your function above) and apply global optimizations to it. It can then generate CPU code from that graph via C++ (and by flipping a simple switch also GPU code). It can also automatically compute symbolic derivatives for you. I've used theano for machine learning problems and it's a really great library for that, although not the easiest one to learn.

    (I'm posting this as an answer because it's too long for a comment)

    Edit:

    I actually had a go at this in theano, but the result is actually about 2x slower on the CPU, see below why. I'll post it here anyway, maybe it's a starting point for someone else to do something better: (this is only partial code, complete with the code from the original post)

    import theano
    
    def make_graph(rho, Z):
        scores = theano.tensor.dot(Z, rho)
    
        # this is very inefficient... it calculates everything twice and
        # then picks one of them depending on scores being positive or not.
        # not sure how to express this in theano in a more efficient way
        pos = theano.tensor.log(1 + theano.tensor.exp(-scores))
        neg = theano.tensor.log(scores + theano.tensor.exp(scores))
        loss_value = theano.tensor.switch(scores > 0, pos, neg)
        loss_value = loss_value.mean()
    
        # however computing the derivative is a real joy now:
        loss_slope = theano.tensor.grad(loss_value, rho)
    
        return loss_value, loss_slope
    
    sym_rho = theano.tensor.col('rho')
    sym_Z = theano.tensor.matrix('Z')
    sym_loss_value, sym_loss_slope = make_graph(sym_rho, sym_Z)
    
    compute_logistic_loss_value_and_slope = theano.function(
            inputs=[sym_rho, sym_Z],
            outputs=[sym_loss_value, sym_loss_slope]
            )
    
    # use function compute_logistic_loss_value_and_slope() as in original code
    

提交回复
热议问题