问题
I want to compile a python function with cython, for reading a binary file skipping some records (without reading the whole file and then slicing, as I would run out of memory). I can come up with something like this:
    def FromFileSkip(fid, count=1, skip=0):            
        if skip>=0:
            data = numpy.zeros(count)
            k = 0
            while k<count:
                try:
                    data[k] = numpy.fromfile(fid, count=1, dtype=dtype)
                    fid.seek(skip, 1)
                    k +=1
                except ValueError:
                    data = data[:k]
                    break
            return data
and then I can use the function like this:
 f = open(filename)
 data = FromFileSkip(f,...
However, for compiling the function "FromFileSkip" with cython, I would like to define all the types involved in the function, so "fid" as well, the file handler. How can I define its type in cython, as it is not a "standard" type, e.g. an integer. Thanks.
回答1:
Defining the type of fid won't help because calling python functions is still costly. Try compiling your example with "-a" flag to see what I mean. However, you can use low-level C functions for file handling to avoid python overhead in your loop. For the sake of example, I assumed that the data starts right from the beginning of the file and that its type is double
from libc.stdio cimport *                                                                
cdef extern from "stdio.h":
    FILE *fdopen(int, const char *)
import numpy as np
cimport numpy as np
DTYPE = np.double # or whatever your type is
ctypedef np.double_t DTYPE_t # or whatever your type is
def FromFileSkip(fid, int count=1, int skip=0):
    cdef int k
    cdef FILE* cfile
    cdef np.ndarray[DTYPE_t, ndim=1] data
    cdef DTYPE_t* data_ptr
    cfile = fdopen(fid.fileno(), 'rb') # attach the stream
    data = np.zeros(count).astype(DTYPE)
    data_ptr = <DTYPE_t*>data.data
    # maybe skip some header bytes here
    # ...
    for k in range(count):
        if fread(<void*>(data_ptr + k), sizeof(DTYPE_t), 1, cfile) < 0:
            break
        if fseek(cfile, skip, SEEK_CUR):
            break
    return data
Note that the output of cython -a example.pyx shows no python overhead inside the loop.
来源:https://stackoverflow.com/questions/15356606/pass-file-handle-to-cython-function