inline-assembly | 易学教程

How can I call a ptx function from CUDA C?

阅读更多关于 How can I call a ptx function from CUDA C?

问题 I am trying to find a way to call a ptx function (.func) from CUDA C. Say I had a ptx function like this: .func (.reg .s32 %res) inc_ptr ( .reg .s32 %ptr, .reg .s32 %inc ) { add.s32 %res, %ptr, %inc; ret; } I know I can call it from ptx like so: call (%d), inc_ptr, (%s, %d); But I have no idea how to call it from CUDA C. I know I can inline ptx assembly with asm(), but I haven't found a way to inline a function. Hope someone can help! Thanks! 回答1: This can be done using the separate

How can I call a ptx function from CUDA C?

阅读更多关于 How can I call a ptx function from CUDA C?

What does mrc p15 do in ARM inline assembly, and how does GNU C inline asm syntax work?

阅读更多关于 What does mrc p15 do in ARM inline assembly, and how does GNU C inline asm syntax work?

问题 what does this line in assembly arm does? mrc p15, 0, %0, c9, c13, 0" : : "r" (counter) who is p15 isn't it should be r15 what are all the others? what is :: who are c9, c1 what is the role of each argument? 回答1: Whilst MRC is a generic co-processor inter-op instruction, cp15 is the control processor - which all modern ARM CPUs have and this has been used by ARM was a means of extending the instruction set for on-chip units such as the cache, MMU, performance monitoring and lots else besides.

Defining a variable inside c++ inline assembly

阅读更多关于 Defining a variable inside c++ inline assembly

问题 Let's say we have the following c++ code: int var1; __asm { mov var1, 2; } Now, what I'd like to know is if I didn't want to define var1 outside the __asm directive, what would I have to do to put it inside it. Is it even possible? Thanks 回答1: To do that, you'll need to create a "naked" method with _declspec(naked) and to write yourself the prolog and the epilog that are normally created by the compiler. The aim of a prolog is to: set up EBP and ESP reserve space on stack for local variables

Defining a variable inside c++ inline assembly

阅读更多关于 Defining a variable inside c++ inline assembly

ARM and NEON can work in parallel?

阅读更多关于 ARM and NEON can work in parallel?

问题 This is with reference to question: Checksum code implementation for Neon in Intrinsics Opening the sub-questions listed in the link as separate individual questions. As multi questions aren't to be asked as a part of single thread. Anyway coming to the question: Can ARM and NEON (speaking in terms of arm cortex-a8 architecture) actually work in parallel? How can I achieve this? Could someone point to me or share some sample implementations(pseudo-code/algorithms/code, not the theoretical

gcc gnu assembly kernel in real mode

阅读更多关于 gcc gnu assembly kernel in real mode

问题 i am trying to build a 16bit kernel in gcc gnu assembly while my bootloader is written in pure assembly but i have trouble printing out strings while single character are okay: Here is my bootloader.asm: org 0x7c00 bits 16 section .text mov ax,0x1000 mov ss,ax mov sp,0x000 mov esp,0xfffe xor ax,ax mov es,ax mov ds,ax mov [bootdrive],dl mov bh,0 mov bp,zeichen mov ah,13h mov bl,06h mov al,1 mov cx,6 mov dh,010h mov dl,01h int 10h load: mov dl,[bootdrive] xor ah,ah int 13h jc load load2: mov ax

Simple assembly example : set inputs and get output - right syntax

阅读更多关于 Simple assembly example : set inputs and get output - right syntax

问题 I try to do a simple example to insert, into a C code, a piece of Sparc assembly 32 bits; this little code performs an incrementation on the variable "sum". The code is : #include <stdio.h> #include <sys/time.h> #include <unistd.h> int n; int sum; int main () { n = 100; sum = 0; struct timeval tv1, tv2; long long diff; gettimeofday (&tv1, NULL); asm volatile ("set sum, %g1\n\t" \ "set n, %g3\n" \ "loop:\n\t" \ "add %g1, 1, %g2\n\t" \ "sub %g3, 1, %g4\n\t" \ "bne loop\n\t" \ "nop\n\t" \ : "=r"

Fastest way to set a Carry Flag

阅读更多关于 Fastest way to set a Carry Flag

问题 I'm doing a cycle to sum two arrays. My objective is do it by avoiding carry checks c = a + b; carry = (c<a) . I lost the CF when I do the loop test, with the cmp instruction. Currently, i am using and the JE and STC to test and set the previously saved state of CF . But the jump takes more less 7 cycles, what it is a lot for what I want. //This one is working asm( "cmp $0,%0;" "je 0f;" "stc;" "0:" "adcq %2, %1;" "setc %0" : "+r" (carry), "+r" (anum) : "r" (bnum) ); I already tried use the

Run time overhead of compiler barrier in gcc for x86 processors

阅读更多关于 Run time overhead of compiler barrier in gcc for x86 processors

问题 I was looking into the side effects/run time overhead of using compiler barrier ( in gcc ) in x86 env. Compiler barrier: asm volatile( ::: "memory" ) GCC documentation tells something interesting ( https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html ) Excerpt: The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input