I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:
def __array__(self, dtype=float):
return np.fromiter(self, dtype, self.size())
The problem is that each iterator next
call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.
Is there a way to pass a pointer to internal_data_[0]
and the value internal_data_.size()
to numpy so that it can directly access or copy the data without all the Python overhead?
You will want to define __array_interface__()
instead. This will let you pass back the pointer and the shape information directly.
Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy
The advantage is that it handles the conversion to numpy arrays automatically.
Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.
If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use memcpy
to copy the data.
So it looks like the only real solution is to base something off pybuffer.i
that can copy from C++ into an existing buffer. If you add this to a SWIG include file:
%insert("python") %{
import numpy as np
%}
/*! Templated function to copy contents of a container to an allocated memory
* buffer
*/
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>
template < typename Container_T >
void copy_to_buffer(
const Container_T& field,
typename Container_T::value_type* buffer,
typename Container_T::size_type length
)
{
// ValidateUserInput( length == field.size(),
// "Destination buffer is the wrong size" );
// put your own assertion here or BAD THINGS CAN HAPPEN
if (length == field.size()) {
std::copy( field.begin(), field.end(), buffer );
}
}
//====
%}
%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {
res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
if ( res < 0 ) {
PyErr_Clear();
%argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
$symname, $argnum);
}
$1 = ($1_ltype) buffer_;
$2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef
%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)
TYPEMAP_COPY_TO_BUFFER(CLASS)
%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;
%extend CLASS {
%insert("python") %{
def __array__(self):
"""Enable access to this data as a numpy array"""
a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
_copy_to_buffer_ ## PYCLASS(self, a)
return a
%}
}
%enddef
then you can make a container "Numpy"-able with
%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);
Then in Python, just do:
# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )
This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.
A slightly more complete version of this code is part of my PyTRT project at github.
来源:https://stackoverflow.com/questions/5424324/fast-conversion-of-c-c-vector-to-numpy-array