fpu | 易学教程

use of FFREE and FDECSTP

阅读更多关于 use of FFREE and FDECSTP

问题 I cannot understand this things: what is the use of such commands (FFREE, FDECSTP)? Can it ve used to pop value out of the fpu stack, or this is for some another purpose? I dont get it :/ Could someone explain that, tnx 回答1: Yes, using FFREE , FINCSTP and FDECSTP you can manage the FPU stack manually. Note that FPU stack grows down similar to the CPU stack, so to remove (pop) something you mark the register as free and increment the stack pointer. You won't see these instructions in typical

FPU IA-32 SIGFPE, Arithmetic exception

阅读更多关于 FPU IA-32 SIGFPE, Arithmetic exception

问题 I have a problem with this code below. It is GAS asm syntax on IA-32 architecture. It generates arithmetic exception, after fsqrt instruction. SetDouble is int type of value 0x0200 and input is a float number. I'm compiling this with -m32 flag using gcc. Can someone point where I made mistake. pushl %ebp movl %esp,%ebp finit fldcw SetDouble fld input fld input fmulp fld1 faddp fsqrt fld1 fxch fsubp fstp output mov %ebp,%esp pop %ebp 回答1: Setting the control word to 0x200 switches the FPU to

Does _control87() also set the SSE MXCSR Control Register?

阅读更多关于 Does _control87() also set the SSE MXCSR Control Register?

问题 The documentation for _control87 notes: _control87 [...] affect[s] the control words for both the x87 and the SSE2, if present. It seems that the SSE and SSE2 MXCSR control registers are identical, however, there is no mention of the SSE unit in the documentation. Does _control87 affect an SSE unit's MXCSR control register or is this only true for SSE2? 回答1: I dug out an old Pentium III and checked with the following code: #include <Windows.h> #include <float.h> #include <xmmintrin.h>

How to change FPU context in signal handler (C++/Linux)

阅读更多关于 How to change FPU context in signal handler (C++/Linux)

问题 I wrote a signal handler to catch FPE errors. I need to continue execution even if this happens. I receive a ucontext_t as parameter, I can change the bad operand from 0 to another value but the FPU context is still bad and I run into an infinite loop ? Does someone already manupulate the ucontext_t structure on Linux ? I finally found a way to handle these situations by clearing the status flag of ucontext_t like this: ... const long int cFPUStatusFlag = 0x3F; aContext->uc_mcontext.fpregs-

How to set double precision in C++ on MacOSX?

阅读更多关于 How to set double precision in C++ on MacOSX?

问题 I'm trying to port _controlfp( _CW_DEFAULT, 0xffffffff ); from WIN32 to Mac OS X / Intel. I have absolutely no idea how to port this instruction... And you? Thanks! 回答1: Try section 8.6 of Gough's Introduction to GCC, which demonstrates the x86 FLDCW instruction. But it helps if you tell us why you need it — if you want your doubles to be IEEE-754 64-bit doubles, the easiest way is to compile with -msse -mfpmath=sse. 回答2: What precision elements are you controlling? According to Microsoft's

How slow is NaN arithmetic in the Intel x64 FPU?

阅读更多关于 How slow is NaN arithmetic in the Intel x64 FPU?

问题 Hints and allegations abound that arithmetic with NaNs can be 'slow' in hardware FPUs. Specifically in the modern x64 FPU, e.g on a Nehalem i7, is that still true? Do FPU multiplies get churned out at the same speed regardless of the values of the operands? I have some interpolation code that can wander off the edge of our defined data, and I'm trying to determine whether it's faster to check for NaNs (or some other sentinel value) here there and everywhere, or just at convenient points. Yes,

Pixel modifying code runs quick in main app, really slow in Delphi 6 DirectShow filter with other problems

阅读更多关于 Pixel modifying code runs quick in main app, really slow in Delphi 6 DirectShow filter with other problems

问题 I have a Delphi 6 application that sends bitmaps to a DirectShow DLL in real-time, 25 frames a second. The DirectShow DLL is my code too and is also written in Delphi 6 using the DSPACK DirectShow component suite. I have a simple block of code that goes through each pixel in the bitmap modifying the brightness and contrast of the image, if a certain flag is set, otherwise the bitmap is pushed out the DirectShow DLL unmodified (push source video filter). The code used to be in the main

Fast float to int conversion and floating point precision on ARM (iPhone 3GS/4)

阅读更多关于 Fast float to int conversion and floating point precision on ARM (iPhone 3GS/4)

问题 I read (http://www.stereopsis.com/FPU.html) mentioned in (What is the fastest way to convert float to int on x86). Does anyone know if the slow simple cast (see snippet below) does apply to ARM architecture, too? inline int Convert(float x) { int i = (int) x; return i; } To apply some tricks mentioned in the FPU article you have to set the precision for floating point operations. How do I do that on ARM? What is the fastest float to int conversion on ARM architecture? Thanks! 回答1: Short

Pixel modifying code runs quick in main app, really slow in Delphi 6 DirectShow filter with other problems

阅读更多关于 Pixel modifying code runs quick in main app, really slow in Delphi 6 DirectShow filter with other problems

I have a Delphi 6 application that sends bitmaps to a DirectShow DLL in real-time, 25 frames a second. The DirectShow DLL is my code too and is also written in Delphi 6 using the DSPACK DirectShow component suite. I have a simple block of code that goes through each pixel in the bitmap modifying the brightness and contrast of the image, if a certain flag is set, otherwise the bitmap is pushed out the DirectShow DLL unmodified (push source video filter). The code used to be in the main application and then I just moved it into the DirectShow DLL. When it was in the main application it ran fine.

division as multiply and LUT ? / fast float division reciprocal

阅读更多关于 division as multiply and LUT ? / fast float division reciprocal

Is it possible to make a reciprocal of float division in form of look up table (such like 1/f -> 1*inv[f] ) ? How it could be done? I think some and mask and shift should be appled to float to make it a form of index? How would be it exectly? You can guess an approximate inverse like this: int x = bit_cast<int>(f); x = 0x7EEEEEEE - x; float inv = bit_cast<float>(x); In my tests, 0x7EF19D07 was slightly better (tested with the effects of 2 Newton-Raphson refinements included). Which you can then improve with Newton-Raphson: inv = inv * (2 - inv * f); Iterate as often as you want. 2 or 3