x86-64

C - Undefined symbols for architecture x86_64 when compiling on Mac OSX Lion

浪尽此生 提交于 2019-11-30 18:19:59
I'm getting some problems on compiling a very very simple name.c file on Mac OSX Lion. Now, I started following Harvard CS50 course on cs50.net. I'm not totally new to programming but I was curious on how this course has been taught. This is the source of name.c: #include <stdio.h> #include <cs50.h> int main(void) { printf("State your name:\n"); string name = GetString(); printf("O hai, %s!\n", name); return 0; } As you can see, it requires this library: https://manual.cs50.net/CS50_Library . Now, when I compile it, this happens: Undefined symbols for architecture x86_64: "_GetString",

What are the exhaustion characteristics of RDRAND on Ivy Bridge?

ぐ巨炮叔叔 提交于 2019-11-30 17:53:52
After reviewing the Intel Digital Random Number Generator (DRNG) Software Implementation Guide , I have a few questions about what happens to the internal state of the generator when RDRAND is invoked. Unfortunately the answers don't seem to be in the guide. According to the guide, inside the DRNG there are four 128-bit buffers that serve random bits for RDRAND to drain. RDRAND itself will provide either 16, 32, or 64 bits of random data depending on the width of the destination register: rdrand ax ; put 16 random bits in ax rdrand eax ; put 32 random bits in eax rdrand rax ; put 64 random

Using GDB to read MSRs

烂漫一生 提交于 2019-11-30 15:51:55
问题 Is there some way to read the x86-64 model-specific registers, specifically IA32_FS_BASE and IA32_GS_BASE, while debugging a program using GDB? Less preferable would be a solution using a dynamic instrumentation package like Intel's Pintool, but it would be appreciated all the same. 回答1: If you prefer not changing your code (or if the code is not available) you could do something similar to amdn's answer in the following way. The call to arch_prctl requires a pointer to a uint64_t, for which

Using GDB to read MSRs

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-30 15:34:08
Is there some way to read the x86-64 model-specific registers, specifically IA32_FS_BASE and IA32_GS_BASE, while debugging a program using GDB? Less preferable would be a solution using a dynamic instrumentation package like Intel's Pintool, but it would be appreciated all the same. If you prefer not changing your code (or if the code is not available) you could do something similar to amdn's answer in the following way. The call to arch_prctl requires a pointer to a uint64_t, for which I use the address to an empty portion of the stack (8 bytes below the current stack pointer). After the call

Is it possible to change virtual memory page size?

空扰寡人 提交于 2019-11-30 15:20:11
Is it possible to change the virtual memory page size? I'm asking this because in the X86_64 part of the MMU article on wikipedia , it talks about different page sizes. If the page size can indeed be changed, how it is changed? On x86_64 you can explicitly request 2 MiB pages instead of the usual 4 KiB pages with the help of hugetlbfs . On modern kernels with transparent huge page support a small pages can automagically concatenated to huge pages in the background, given that the memory fragmentation isn't to big and enough memory is still free. As far as I know, no operating system allows the

Why is imul used for multiplying unsigned numbers?

China☆狼群 提交于 2019-11-30 14:31:51
I compiled the following program: #include <stdint.h> uint64_t usquare(uint32_t x) { return (uint64_t)x * (uint64_t)x; } This disassembles to: 0: 89 f8 mov eax,edi 2: 48 0f af c0 imul rax,rax 6: c3 ret But imul is the instruction for multiplying signed numbers. Why is it used by gcc then? /edit: when using uint64_t the assembly is similar: 0: 48 0f af ff imul rdi,rdi 4: 48 89 f8 mov rax,rdi 7: c3 ret WARNING This answer is long! ... and it's full of unneeded explanations - but I have always wanted to write something more lengthy about the multiplication. A bit of theory When multiplying two

C++ 64 bit int: pass by reference or pass by value

女生的网名这么多〃 提交于 2019-11-30 13:52:44
问题 This is an efficiency question about 64 bit ints. Assuming I don't need to modify the value of a "int" parameter, should I pass it by value or reference. Assuming 32 bit machine: 1) 32 bit int: I guess the answer is "pass by value" as "pass by reference" will have overhead of extra memory lookup. 2) 64 bit int: If I pass by reference, I only pass 32 bit address on the stack, but need an extra memory lookup. So which one of them is better (reference or value)? What if the machine is 64 bit?

How to interpret segment register accesses on x86-64?

梦想与她 提交于 2019-11-30 13:47:57
With this function: mov 1069833(%rip),%rax # 0x2b5c1bf9ef90 <_fini+3250648> add %fs:0x0,%rax retq How do I interpret the second instruction and find out what was added to RAX? This code: mov 1069833(%rip),%rax # 0x2b5c1bf9ef90 <_fini+3250648> add %fs:0x0,%rax retq is returning the address of a thread-local variable. %fs:0x0 is the address of the TCB (Thread Control Block), and 1069833(%rip) is the offset from there to the variable, which is known since the variable resides either in the program or on some dynamic library loaded at program's load time (libraries loaded at runtime via dlopen()

SIMD instructions for floating point equality comparison (with NaN == NaN)

为君一笑 提交于 2019-11-30 13:19:55
问题 Which instructions would be used for comparing two 128 bit vectors consisting of 4 * 32-bit floating point values? Is there an instruction that considers a NaN value on both sides as equal? If not, how big would the performance impact of a workaround that provides reflexivity (i.e. NaN equals NaN) be? I heard that ensuring reflexivity would have a significant performance impact compared with IEEE semantics, where NaN doesn't equal itself, and I'm wondering if big that impact would be. I know

Why don't GCC and Clang use cvtss2sd [memory]?

一世执手 提交于 2019-11-30 13:09:07
I'm trying to optimize some code that's supposed to read single precision floats from memory and perform arithmetic on them in double precision. This is becoming a significant performance bottleneck, as the code that stores data in memory as single precision is substantially slower than equivalent code that stores data in memory as double precision. Below is a toy C++ program that captures the essence of my issue: #include <cstdio> // noinline to force main() to actually read the value from memory. __attributes__ ((noinline)) float* GetFloat() { float* f = new float; *f = 3.14; return f; } int