Python ctypes how to read a byte from a character array passed to NASM

匿名 (未验证) 提交于 2019-12-03 01:40:02

问题:

UPDATE: I solved this problem with the help of Mark Tolonen's answer below. Here is the solution (but I'm puzzled by one thing):

I begin with the encoding string shown in Mark Tolonen's answer below (UTF-8):

CA_f1 = (ctypes.c_char_p * len(f1))(*(name.encode() for name in f1))

With optimizations off, I always store rcx into a memory variable on entry. Later in the program when I need to use the pointer in rcx, I read it from memory. That works for a single pointer, but doesn't work for accessing the pointer array Mark Tolonen showed below; maybe that's because it's a pointer array, not just a single pointer. It DOES work if I store rcx into r15 on entry, and downstream in the program it works like this:

;To access the first char of the first name pair:   xor rax,rax mov rdx,qword[r15] movsx eax,BYTE[rdx] ret  ;To access the second char of the second name pair:   mov rdx,qword[r15+8] movsx eax,BYTE[rdx+1]

That's not a problem because I usually store as many variables as possible in registers; sometimes there are not enough registers, so I have to resort to storing some in memory. Now, when processing strings, I will always reserve r15 to hold the pointer passed in rcx if it's a pointer array.

Any insight into why the memory location doesn't work?

**** END OF ANSWER ****

I'm new to string processing in NASM, and I am passing a string from ctypes. The string data is read from a text file (Windows .txt), using the following Python function:

with open(fname, encoding = "utf8") as f1:         for item in f1:             item = item.lstrip()             item = item.rstrip()             return_data.append(item)     return return_data

The .txt file contains a list of first and last names, separated by newline-linefeed characters.

I pass a c_char_p pointer to a NASM dll using ctypes. The pointer is created with this:

CA_f1 = (ctypes.c_char_p * len(f1))()

Visual Studio confirms that it is a pointer to a byte string 50 NAMES long, which is where the problem may be, I need bytes, not list elements. Then I pass it using this ctypes syntax:

CallName.argtypes = [ctypes.POINTER(ctypes.c_char_p),ctypes.POINTER(ctypes.c_double),ctypes.POINTER(ctypes.c_double)]

UPDATE: before passing the string, now I convert the list to a string like this:

f1_x = ' '.join(f1)

Now VS shows a pointer to a 558 byte string, which is correct, but I still can't read a byte.

In my NASM program, I test it by reading a random byte into al using the following code:

lea rdi,[rel f1_ptr] mov rbp,qword [rdi] ; Pointer xor rax,rax mov al,byte[rbp+1]

But the return value in rax is 0.

If I create a local string buffer like this:

name_array: db "Margaret Swanson"

I can read it this way:

mov rdi,name_array xor rax,rax mov al,[rdi]

But not from a pointer passed into a dll.

Here's the full code for a simple, reproducible example in NASM. Before passing it to NASM, I checked random bytes and they are what I expect, so I don't think it's encoding.

[BITS 64] [default rel]  extern malloc, calloc, realloc, free global Main_Entry_fn export Main_Entry_fn global FreeMem_fn export FreeMem_fn  section .data align=16 f1_ptr: dq 0 f1_length: dq 0 f2_ptr: dq 0 f2_length: dq 0 data_master_ptr: dq 0  section .text  String_Test_fn: ;______  lea rdi,[rel f1_ptr] mov rbp,qword [rdi] xor rax,rax mov al,byte[rbp+10] ret  ;__________ ;Free the memory  FreeMem_fn: sub rsp,40 call free add rsp,40 ret  ; __________ ; Main Entry  Main_Entry_fn: push rdi push rbp mov [f1_ptr],rcx mov [f2_ptr],rdx  mov [data_master_ptr],r8 lea rdi,[data_master_ptr] mov rbp,[rdi] xor rcx,rcx movsd xmm0,qword[rbp+rcx] cvttsd2si rax,xmm0 mov [f1_length],rax add rcx,8 movsd xmm0,qword[rbp+rcx] cvttsd2si rax,xmm0 mov [f2_length],rax add rcx,8  call String_Test_fn  pop rbp pop rdi ret

UPDATE 2:

In reply to a request, here is a ctypes wrapper to use:

def Read_Data():      Dir= "[FULL PATH TO DATA]"      fname1 = Dir + "Random Names.txt"     fname2 = Dir + "Random Phone Numbers.txt"      f1 = Trans_02_Data.StrDataRead(fname1)     f2 = Trans_02_Data.StrDataRead(fname2)     f2_Int = [  int(numeric_string) for numeric_string in f2]     StringTest_asm(f1, f2_Int)  def StringTest_asm(f1,f2):      f1.append("0")      f1_x = ' '.join(f1)     f1_x[0].encode(encoding='UTF-8',errors='strict')      Input_Length_Array = []     Input_Length_Array.append(len(f1))     Input_Length_Array.append(len(f2*8))      length_array_out = (ctypes.c_double * len(Input_Length_Array))(*Input_Length_Array)      CA_f1 = (ctypes.c_char_p * len(f1_x))() #due to SO research     CA_f2 = (ctypes.c_double * len(f2))(*f2)     hDLL = ctypes.WinDLL("C:/NASM_Test_Projects/StringTest/StringTest.dll")     CallName = hDLL.Main_Entry_fn     CallName.argtypes = [ctypes.POINTER(ctypes.c_char_p),ctypes.POINTER(ctypes.c_double),ctypes.POINTER(ctypes.c_double)]     CallName.restype = ctypes.c_int64      Free_Mem = hDLL.FreeMem_fn     Free_Mem.argtypes = [ctypes.POINTER(ctypes.c_double)]     Free_Mem.restype = ctypes.c_int64     start_time = timeit.default_timer()      ret_ptr = CallName(CA_f1,CA_f2,length_array_out)      abc = 1 #Check the value of the ret_ptr, should be non-zero   

回答1:

Your name-reading code would return a list of Unicode strings. The following would encode a list of Unicode strings into an array of strings to be passed to a function taking a POINTER(c_char_p):

>>> import ctypes >>> names = ['Mark','John','Craig'] >>> ca = (ctypes.c_char_p * len(names))(*(name.encode() for name in names)) >>> ca <__main__.c_char_p_Array_3 object at 0x000001DB7CF5F6C8> >>> ca[0] b'Mark' >>> ca[1] b'John' >>> ca[2] b'Craig'

If ca is passed to your function as the first parameter, the address of that array would be in rcx per x64 calling convention. The following C code and its disassembly shows how the VS2017 Microsoft compiler reads it:

DLL code (test.c)

#define API __declspec(dllexport)  int API func(const char** instr) {     return (instr[0][0] << 16) + (instr[1][0] << 8) + instr[2][0]; }

Disassembly (compiled optimized to keep short, my comments added)

; Listing generated by Microsoft (R) Optimizing Compiler Version 19.00.24215.1  include listing.inc  INCLUDELIB LIBCMT INCLUDELIB OLDNAMES  PUBLIC  func ; Function compile flags: /Ogtpy ; File c:\test.c _TEXT   SEGMENT instr$ = 8 func    PROC  ; 5    :     return (instr[0][0] << 16) + (instr[1][0] << 8) + instr[2][0];    00000 48 8b 51 08      mov     rdx, QWORD PTR [rcx+8]  ; address of 2nd string   00004 48 8b 01         mov     rax, QWORD PTR [rcx]    ; address of 1st string   00007 48 8b 49 10      mov     rcx, QWORD PTR [rcx+16] ; address of 3rd string   0000b 44 0f be 02      movsx   r8d, BYTE PTR [rdx]     ; 1st char of 2nd string, r8d=4a   0000f 0f be 00         movsx   eax, BYTE PTR [rax]     ; 1st char of 1st string, eax=4d   00012 0f be 11         movsx   edx, BYTE PTR [rcx]     ; 1st char of 3rd string, edx=43   00015 c1 e0 08         shl     eax, 8                  ; eax=4d00   00018 41 03 c0         add     eax, r8d                ; eax=4d4a   0001b c1 e0 08         shl     eax, 8                  ; eax=4d4a00   0001e 03 c2            add     eax, edx                ; eax=4d4a43  ; 6    : }    00020 c3               ret     0 func    ENDP _TEXT   ENDS END

Python code (test.py)

from ctypes import *  dll = CDLL('test') dll.func.argtypes = POINTER(c_char_p), dll.restype = c_int  names = ['Mark','John','Craig'] ca = (c_char_p * len(names))(*(name.encode() for name in names)) print(hex(dll.func(ca)))

Output:

0x4d4a43

That's the correct ASCII codes for 'M', 'J', and 'C'.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!