inline-assembly | 易学教程

For loop in x86 assembly and optimising code?

阅读更多关于 For loop in x86 assembly and optimising code?

问题 I am currently learning assembly programming as part of one of my university modules. I have a program written in C++ with inline x86 assembly which takes a string of 6 characters and encrypts them based on the encryption key. Here's the full program: https://gist.github.com/anonymous/1bb0c3be77566d9b791d My code fo the encrypt_chars function: void encrypt_chars (int length, char EKey) { char temp_char; // char temporary store for (int i = 0; i < length; i++) // encrypt characters one at a

Code injecting/assembly inlining in Java?

阅读更多关于 Code injecting/assembly inlining in Java?

I know Java is a secure language but when matrix calculations are needed, can I try something faster? I am learning __asm{} in C++, Digital-Mars compiler and FASM. I want to do the same in Java. How can I inline assembly codes in functions? Is this even possible? Something like this (a vectorized loop to clamp all elements of an array to a value without branching, using AVX support of CPU): JavaAsmBlock( # get pointers into registers somehow # and tell Java which registers the asm clobbers somehow vbroadcastss twenty_five(%rip), %ymm0 xor %edx,%edx .Lloop: # do { vmovups (%rsi, %rdx, 4), %ymm1

segmentation fault for `vmovaps'

阅读更多关于 segmentation fault for `vmovaps'

问题 I wrote a code to add two arrays using KNC instructions with (512bit long vectors) on Xeon Phi intel coprocessor. However I've got segmentation part in the inline assembly part. Here it is my code: int main(int argc, char* argv[]) { int i; const int length = 65536; const int AVXLength = length / 16; float *A = (float*) aligned_malloc(length * sizeof(float), 64); float *B = (float*) aligned_malloc(length * sizeof(float), 64); float *C = (float*) aligned_malloc(length * sizeof(float), 64); for

What does asm volatile do in C?

阅读更多关于 What does __asm__ __volatile__ do in C?

I looked into some C code from http://www.mcs.anl.gov/~kazutomo/rdtsc.html They use stuff like " inline ", " asm " etc like the following: code1: static __inline__ tick gettick (void) { unsigned a, d; __asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) ); return (((tick)a) | (((tick)d) << 32)); } code2: volatile int __attribute__((noinline)) foo2 (int a0, int a1) { __asm__ __volatile__ (""); } I was wondering what does the code1 and code2 do? The __volatile__ modifier on an __asm__ block forces the compiler's optimizer to execute the code as-is. Without it, the optimizer may think it can be

Impossible constraint with cmpxchg16b in extended assembly

阅读更多关于 Impossible constraint with cmpxchg16b in extended assembly

I am trying to write inline assembly with my C code to perform compare and swap operation. My code is: typedef struct node { int data; struct node * next; struct node * backlink; int flag; int mark; } node_lf; typedef struct searchfrom { node_lf * current; node_lf * next; } return_sf; typedef struct csArg { node_lf * node; int mark; int flag; } cs_arg; typedef struct return_tryFlag { node_lf * node; int result; } return_tf; static inline node_lf cs(node_lf * address, cs_arg *old_val, cs_arg *new_val) { node_lf value = *address; __asm__ __volatile__("lock; cmpxchg16b %0; setz %1;" :"=m"(*

Convert inline Intel ASM to AT&T ASM with GCC Extended ASM

阅读更多关于 Convert inline Intel ASM to AT&T ASM with GCC Extended ASM

问题 I've spent the last 2 days to study AT&T inline assembly, but I'm having some problems converting this one: static char vendername[50] = {0}; _asm { mov eax, 0 cpuid mov dword ptr [vendername], ebx mov dword ptr [vendername+4], edx mov dword ptr [vendername+8], ecx } Here is my try: static char vendername[50] = {0}; __asm__( "movl $0,%%eax \n" "cpuid \n" "movl %%ebx, %[vendername] \n" "movl %%edx, %[vendername+$4] \n" "movl %%ecx, %[vendername+$8] \n" :"=r"(vendername) //user vendername as

Strange 'asm' operand has impossible constraints error

阅读更多关于 Strange 'asm' operand has impossible constraints error

I'm trying to compile a simple C program (Win7 32bit, Mingw32 Shell and GCC 5.3.0). The C code is like this: #include <stdio.h> #include <stdlib.h> #define _set_tssldt_desc(n,addr,type) \ __asm__ ("movw $104,%1\n\t" \ :\ :"a" (addr),\ "m" (*(n)),\ "m" (*(n+2)),\ "m" (*(n+4)),\ "m" (*(n+5)),\ "m" (*(n+6)),\ "m" (*(n+7))\ ) #define set_tss_desc(n,addr) _set_tssldt_desc(((char *) (n)),addr,"0x89") char *n; char *addr; int main(void) { char *n = (char *)malloc(100*sizeof(int)); char *addr = (char *)malloc(100*sizeof(int)); set_tss_desc(n, addr); free(n); free(addr); return 0; } _set_tssldt_desc(n

Cannot read back from MSR

阅读更多关于 Cannot read back from MSR

问题 I am writing a kernel module and it is about reading and writing MSRs. I wrote a simple program for testing but it still fails. All it is doing is writing to MSR, then reading it back. Here is the code: static int __init test3_init(void) { uint32_t hi,lo; hi=0; lo=0xb; asm volatile("mov %0,%%eax"::"r"(lo)); asm volatile("mov %0,%%edx"::"r"(hi)); asm volatile("mov $0x38d,%ecx"); asm volatile("wrmsr"); printk("exit_write: hi=%08x lo=%08x\n",hi,lo); asm volatile("mov $0x38d,%ecx"); asm volatile(

Accessing C++ class member in inline assembly

阅读更多关于 Accessing C++ class member in inline assembly

问题 Question: How can I access a member variable in assembly from within a non-POD class? Elaboration: I have written some inline assembly code for a class member function but what eludes me is how to access class member variables. I've tried the offsetof macro but this is a non-POD class. The current solution I'm using is to assign a pointer from global scope to the member variable but it's a messy solution and I was hoping there was something better that I dont know about. note: I'm using the G

segmentation fault for `vmovaps'

阅读更多关于 segmentation fault for `vmovaps'

I wrote a code to add two arrays using KNC instructions with (512bit long vectors) on Xeon Phi intel coprocessor. However I've got segmentation part in the inline assembly part. Here it is my code: int main(int argc, char* argv[]) { int i; const int length = 65536; const int AVXLength = length / 16; float *A = (float*) aligned_malloc(length * sizeof(float), 64); float *B = (float*) aligned_malloc(length * sizeof(float), 64); float *C = (float*) aligned_malloc(length * sizeof(float), 64); for(i=0; i<length; i++){ A[i] = 1; B[i] = 2; } float * pA = A; float * pB = B; float * pC = C; for(i=0; i