UPDATE: I solved this problem with the help of Mark Tolonen's answer below. Here is the solution (but I'm puzzled by one thing):
I begin with the encoding string shown in Mark Tolonen's answer below (UTF-8):
CA_f1 = (ctypes.c_char_p * len(f1))(*(name.encode() for name in f1))
With optimizations off, I always store rcx into a memory variable on entry. Later in the program when I need to use the pointer in rcx, I read it from memory. That works for a single pointer, but doesn't work for accessing the pointer array Mark Tolonen showed below; maybe that's because it's a pointer array, not just a single pointer. It DOES work if I store rcx into r15 on entry, and downstream in the program it works like this:
;To access the first char of the first name pair: xor rax,rax mov rdx,qword[r15] movsx eax,BYTE[rdx] ret ;To access the second char of the second name pair: mov rdx,qword[r15+8] movsx eax,BYTE[rdx+1]
That's not a problem because I usually store as many variables as possible in registers; sometimes there are not enough registers, so I have to resort to storing some in memory. Now, when processing strings, I will always reserve r15 to hold the pointer passed in rcx if it's a pointer array.
Any insight into why the memory location doesn't work?
**** END OF ANSWER ****
I'm new to string processing in NASM, and I am passing a string from ctypes. The string data is read from a text file (Windows .txt), using the following Python function:
with open(fname, encoding = "utf8") as f1: for item in f1: item = item.lstrip() item = item.rstrip() return_data.append(item) return return_data
The .txt file contains a list of first and last names, separated by newline-linefeed characters.
I pass a c_char_p pointer to a NASM dll using ctypes. The pointer is created with this:
CA_f1 = (ctypes.c_char_p * len(f1))()
Visual Studio confirms that it is a pointer to a byte string 50 NAMES long, which is where the problem may be, I need bytes, not list elements. Then I pass it using this ctypes syntax:
CallName.argtypes = [ctypes.POINTER(ctypes.c_char_p),ctypes.POINTER(ctypes.c_double),ctypes.POINTER(ctypes.c_double)]
UPDATE: before passing the string, now I convert the list to a string like this:
f1_x = ' '.join(f1)
Now VS shows a pointer to a 558 byte string, which is correct, but I still can't read a byte.
In my NASM program, I test it by reading a random byte into al using the following code:
lea rdi,[rel f1_ptr] mov rbp,qword [rdi] ; Pointer xor rax,rax mov al,byte[rbp+1]
But the return value in rax is 0.
If I create a local string buffer like this:
name_array: db "Margaret Swanson"
I can read it this way:
mov rdi,name_array xor rax,rax mov al,[rdi]
But not from a pointer passed into a dll.
Here's the full code for a simple, reproducible example in NASM. Before passing it to NASM, I checked random bytes and they are what I expect, so I don't think it's encoding.
[BITS 64] [default rel] extern malloc, calloc, realloc, free global Main_Entry_fn export Main_Entry_fn global FreeMem_fn export FreeMem_fn section .data align=16 f1_ptr: dq 0 f1_length: dq 0 f2_ptr: dq 0 f2_length: dq 0 data_master_ptr: dq 0 section .text String_Test_fn: ;______ lea rdi,[rel f1_ptr] mov rbp,qword [rdi] xor rax,rax mov al,byte[rbp+10] ret ;__________ ;Free the memory FreeMem_fn: sub rsp,40 call free add rsp,40 ret ; __________ ; Main Entry Main_Entry_fn: push rdi push rbp mov [f1_ptr],rcx mov [f2_ptr],rdx mov [data_master_ptr],r8 lea rdi,[data_master_ptr] mov rbp,[rdi] xor rcx,rcx movsd xmm0,qword[rbp+rcx] cvttsd2si rax,xmm0 mov [f1_length],rax add rcx,8 movsd xmm0,qword[rbp+rcx] cvttsd2si rax,xmm0 mov [f2_length],rax add rcx,8 call String_Test_fn pop rbp pop rdi ret
UPDATE 2:
In reply to a request, here is a ctypes wrapper to use:
def Read_Data(): Dir= "[FULL PATH TO DATA]" fname1 = Dir + "Random Names.txt" fname2 = Dir + "Random Phone Numbers.txt" f1 = Trans_02_Data.StrDataRead(fname1) f2 = Trans_02_Data.StrDataRead(fname2) f2_Int = [ int(numeric_string) for numeric_string in f2] StringTest_asm(f1, f2_Int) def StringTest_asm(f1,f2): f1.append("0") f1_x = ' '.join(f1) f1_x[0].encode(encoding='UTF-8',errors='strict') Input_Length_Array = [] Input_Length_Array.append(len(f1)) Input_Length_Array.append(len(f2*8)) length_array_out = (ctypes.c_double * len(Input_Length_Array))(*Input_Length_Array) CA_f1 = (ctypes.c_char_p * len(f1_x))() #due to SO research CA_f2 = (ctypes.c_double * len(f2))(*f2) hDLL = ctypes.WinDLL("C:/NASM_Test_Projects/StringTest/StringTest.dll") CallName = hDLL.Main_Entry_fn CallName.argtypes = [ctypes.POINTER(ctypes.c_char_p),ctypes.POINTER(ctypes.c_double),ctypes.POINTER(ctypes.c_double)] CallName.restype = ctypes.c_int64 Free_Mem = hDLL.FreeMem_fn Free_Mem.argtypes = [ctypes.POINTER(ctypes.c_double)] Free_Mem.restype = ctypes.c_int64 start_time = timeit.default_timer() ret_ptr = CallName(CA_f1,CA_f2,length_array_out) abc = 1 #Check the value of the ret_ptr, should be non-zero