How to get this sqrt inline assembly working for iOS

与世无争的帅哥 提交于 2019-12-11 09:09:27

问题


I am trying to follow another SO post and implement sqrt14 within my iOS app:

double inline __declspec (naked) __fastcall sqrt14(double n)
{
    _asm fld qword ptr [esp+4]
    _asm fsqrt
    _asm ret 8
}

I have modified this to the following in my code:

double inline __declspec (naked) sqrt14(double n)
{
    __asm__("fld qword ptr [esp+4]");
    __asm__("fsqrt");
    __asm__("ret 8");
}

Above, I have removed the "__fastcall" keyword from the method definition since my understanding is that it is for x86 only. The above gives the following errors for each assembly line respectively:

Unexpected token in argument list

Invalid instruction

Invalid instruction

I have attempted to read through a few inline ASM guides and other posts on how to do this, but I am generally just unfamiliar with the language. I know MIPS quite well, but these commands/registers seem to be very different. For example, I don't understand why the original author never uses the passed in "n" value anywhere in the assembly code.

Any help getting this to work would be greatly appreciated! I am trying to do this because I am building an app where I need to calculate sqrt (ok, yes, I could do a lookup table, but for right now I care a lot about precision) on every pixel of a live-video feed. I am currently using the standard sqrt, and in addition to the rest of the computation, I'm running at around 8fps. Hoping to bump that up a frame or two with this change.

If it matters: I'm building the app to ideally be compatibly with any current iOS device that can run iOS 7.1 Again, many thanks for any help.


回答1:


The compiler is perfectly capable of generating fsqrt instruction, you don't need inline asm for that. You might get some extra speed if you use -ffast-math.

For completeness' sake, here is the inline asm version:

__asm__ __volatile__ ("fsqrt" : "=t" (n) : "0" (n));

The fsqrt instruction has no explicit operands, it uses the top of the stack implicitly. The =t constraint tells the compiler to expect the output on the top of the fpu stack and the 0 constraint instructs the compiler to place the input in the same place as output #0 (ie. the top of the fpu stack again).

Note that fsqrt is of course x86-only, meaning it wont work for example on ARM cpus.



来源:https://stackoverflow.com/questions/23301293/how-to-get-this-sqrt-inline-assembly-working-for-ios

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!