arm

GCC generated assembly for unaligned float access on ARM

陌路散爱 提交于 2021-02-20 17:56:41
问题 Hello I am currently working on a program where I need to process a data blob that contains a series of floats which could be unaligned (and also are sometimes). I am compiling with gcc 4.6.2 for an ARM cortex-a8. I have a question to the generated assembly code: As example I wrote a minimal example: For the following test code float aligned[2]; float *unaligned = (float*)(((char*)aligned)+2); int main(int argc, char **argv) { float f = unaligned[0]; return (int)f; } the compiler (gcc 4.6.2 -

GCC generated assembly for unaligned float access on ARM

三世轮回 提交于 2021-02-20 17:56:12
问题 Hello I am currently working on a program where I need to process a data blob that contains a series of floats which could be unaligned (and also are sometimes). I am compiling with gcc 4.6.2 for an ARM cortex-a8. I have a question to the generated assembly code: As example I wrote a minimal example: For the following test code float aligned[2]; float *unaligned = (float*)(((char*)aligned)+2); int main(int argc, char **argv) { float f = unaligned[0]; return (int)f; } the compiler (gcc 4.6.2 -

Replacing memcpy with neon intrinsics

北战南征 提交于 2021-02-20 04:26:28
问题 I am trying to beat the "memcpy" function by writing the neon intrinsics for the same . Below is my logic : uint8_t* m_input; //Size as 400 x300 uint8_t* m_output; //Size as 400 x300 //not mentioning the complete code base for memory creat memcpy(m_output, m_input, sizeof(m_output[0]) * 300* 400); Neon: int32_t ht_index,wd_index; uint8x16_t vector8x16_image; for(int32_t htI =0;htI < m_roiHeight;htI++){ ht_index = htI * m_roiWidth ; for(int32_t wdI = 0;wdI < m_roiWidth;wdI+=16){ wd_index = ht

arm compiler 5 do not fully respect volatile qualifier

瘦欲@ 提交于 2021-02-19 02:59:27
问题 Consider the following code: volatile int status; status = process_package_header(&pack_header, PACK_INFO_CONST); if ((((status) == (SUCCESS_CONST)) ? ((random_delay() && ((SUCCESS_CONST) == (status))) ? 0 : side_channel_sttack_detected()) : 1)) { ... } Which generates this machine code (produced with the toolchain's objdump ): 60: f7ff fffe bl 0 <process_package_header> 64: 9000 str r0, [sp, #0] /* <- storing to memory as status is volatile */ 66: 42a0 cmp r0, r4 /* <- where is the load

Optimizing horizontal boolean reduction in ARM NEON

时光总嘲笑我的痴心妄想 提交于 2021-02-18 10:59:09
问题 I'm experimenting with a cross-platform SIMD library ala ecmascript_simd aka SIMD.js, and part of this is providing a few "horizontal" SIMD operations. In particular, the API that library offers includes any(<boolN x M>) -> bool and all(<boolN x M>) -> bool functions, where <T x K> is a vector of K elements of type T and boolN is an N -bit boolean, i.e. all ones or all zeros, as SSE and NEON return for their comparison operations. For example, let v be a <bool32 x 4> (a 128-bit vector), it

segmentation fault at every assembly code

社会主义新天地 提交于 2021-02-17 07:10:03
问题 I'am trying to learn assemly on raspberry pi.But I couldn't get started, every code i write gets "Segmentation Fault". .text .global _start _start: MOV R0, #2 SWI 0 This code gets segmentation fault. Even if I delete the MOV line it gets segmentation fault. 回答1: Try: bx lr @ Exit if use gcc as linker or mov r7, #1 @ Exit if use ld as linker svc #0 @ Exit if use ld as linker Some version use swi , I have success with svc using ld as the linker. If you use gcc as the linker, the lr register has

segmentation fault at every assembly code

梦想与她 提交于 2021-02-17 07:09:05
问题 I'am trying to learn assemly on raspberry pi.But I couldn't get started, every code i write gets "Segmentation Fault". .text .global _start _start: MOV R0, #2 SWI 0 This code gets segmentation fault. Even if I delete the MOV line it gets segmentation fault. 回答1: Try: bx lr @ Exit if use gcc as linker or mov r7, #1 @ Exit if use ld as linker svc #0 @ Exit if use ld as linker Some version use swi , I have success with svc using ld as the linker. If you use gcc as the linker, the lr register has

Inline assembly statements in C code and extended ASM for ARM Cortex architectures

你。 提交于 2021-02-17 05:18:30
问题 I am trying to compile the following two pieces of code with ARM Compiler 5 for a Cortex A microprocessor: Part 1 : static inline void cp15_write_sctlr(uint32_t value) { asm("mcr p15, 0, %0, c1, c0, 0" :: "r"(value)); } static inline uint32_t cp15_read_actlr(void) { uint32_t actlr; asm("mrc p15, 0, %0, c1, c0, 1" : "=r"(actlr)); return actlr; } Part 2 : static inline void dmb(void) { asm("dmb" ::: "memory"); } static inline void dsb(void) { asm("dsb" ::: "memory"); } static inline void isb

Xcode build target difference - arm64 and armv7, arm64

浪尽此生 提交于 2021-02-17 04:54:23
问题 I had created 2 new projects on Xcode this week for 2 different apps. For some reason one of the project always fails compiling for the device target. I then realized that the device target is different for both the project. For the working project "Any iOS Project (arm64)" The project which fails to build has "Any iOS Project (armv7, arm64)" What causes the device target to change this way and what is the difference? 回答1: armv7 is 32bit architecture that was supported by earlier iOS versions

中国自主处理器暗流涌动:我们有龙芯、飞腾...

丶灬走出姿态 提交于 2021-02-12 11:03:37
近年来,在核高基项目补贴和国家级集成电路产业投资基金的扶持下,国内从事高性能CPU设计的单位或公司数量也不断壮大,这当中有像龙芯、飞腾、申 威这样 拥有深厚技术底蕴的老牌IC设计单位,也有像宏芯、兆芯这样新秀;既有展讯这样的国有控股公司,也有海思这样的非国有制企业。 有人评价这是“百家争鸣、百花齐放”,也有人评价这是“重复建设、互相倾轧”。事实上,在眼花缭乱的设计单位和公司中,根据自主可控程度高低和市场化运营的难易,可以分为三种难度模式: 一种走独立自主路线,构建自己技术体系的Hard模式,其代表是龙芯、申威; 另一种是自己设计微结构,保障芯片安全可控,但依附于Wintel或AA体系,兼容其软件生态的Normal模式,其代表是飞腾、君正、众志; 最后一种是和大陆外厂商合作、合资,或者在软件和硬件方面完全依附于AA体系的Simple模式,前者的代表是兆芯、宏芯,后者的代表是海思、展讯。下面,我们就从三种难度模式的发展路线盘点国产CPU的技术路线和市场前景。 一、Hard模式发展路线 独立自主发展路线顾名思义在知识产权、发展路线选择权方面是完全由自己说了算,走自主路线有以下几个特点: 1、拥有自主发展权。 拥有自己的指令集,可以自主扩展指令集,在发展方向上可以自主选择。 例如龙芯就在获得Mips永 久授权的同时,自行扩展了148条loongEXT、5条loongVM指令