cortex-a

Replacing memcpy with neon intrinsics

北战南征 提交于 2021-02-20 04:26:28
问题 I am trying to beat the "memcpy" function by writing the neon intrinsics for the same . Below is my logic : uint8_t* m_input; //Size as 400 x300 uint8_t* m_output; //Size as 400 x300 //not mentioning the complete code base for memory creat memcpy(m_output, m_input, sizeof(m_output[0]) * 300* 400); Neon: int32_t ht_index,wd_index; uint8x16_t vector8x16_image; for(int32_t htI =0;htI < m_roiHeight;htI++){ ht_index = htI * m_roiWidth ; for(int32_t wdI = 0;wdI < m_roiWidth;wdI+=16){ wd_index = ht

ARM single-copy atomicity

我的梦境 提交于 2020-06-26 04:11:25
问题 I am currently wading through the ARM architecture manual for the ARMv7 core. In chapter A3.5.3 about atomicity of memory accesses, it states: If a single-copy atomic load overlaps a single-copy atomic store and for any of the overlapping bytes the load returns the data written by the write inserted into the Coherence order of that byte by the single-copy atomic store then the load must return data from a point in the Coherence order no earlier than the writes inserted into the Coherence

warning: format '%ld' expects argument of type 'long int', but argument has type '__builtin_neon_di'

家住魔仙堡 提交于 2020-01-06 13:14:16
问题 Wrt my this question,I am not able to cross check the output . I am getting some wrong print statement after execution .Can someone tell me whether printf() statements are wrong or logic that I am doing is wrong . CODE: int64_t arr[2] = {227802,9896688}; int64x2_t check64_2 = vld1q_s64(arr); for(int i = 0;i < 2; i++){ printf("check64_2[%d]: %ld\n",i,check64_2[i]); } int64_t way1 = check64_2[0] + check64_2[1]; int64x1_t way2 = vset_lane_s64(vgetq_lane_s64(check64_2, 0) + vgetq_lane_s64(check64

warning: format '%ld' expects argument of type 'long int', but argument has type '__builtin_neon_di'

丶灬走出姿态 提交于 2020-01-06 13:13:14
问题 Wrt my this question,I am not able to cross check the output . I am getting some wrong print statement after execution .Can someone tell me whether printf() statements are wrong or logic that I am doing is wrong . CODE: int64_t arr[2] = {227802,9896688}; int64x2_t check64_2 = vld1q_s64(arr); for(int i = 0;i < 2; i++){ printf("check64_2[%d]: %ld\n",i,check64_2[i]); } int64_t way1 = check64_2[0] + check64_2[1]; int64x1_t way2 = vset_lane_s64(vgetq_lane_s64(check64_2, 0) + vgetq_lane_s64(check64

state of TTBR0/1 wrt to multiple guests in case of virtualization in arm

老子叫甜甜 提交于 2020-01-01 18:37:26
问题 TTBR0/1 are CP15 registers which are programmed by PL1 OS. Now If PL1 OS1 programs TTBR0 and then on the same core the PL1 OS2 is scheduled, would the PL1 OS2 see the value of TTBR0/1 set the OS1 I am sure there is someway that the sanity is maintained, is the following is true? While switching between guests the hypervisor saves all cp15 regs in guest context and later restores before switching the guest if yes then wouldnt it be time consuming for hypervisor as the list of cp15 regs would

Neon Comparison [duplicate]

最后都变了- 提交于 2019-12-25 05:36:08
问题 This question already has answers here : arm neon compare operations generate negative one (2 answers) Closed 3 years ago . As per the Neon documentation: If the comparison is true for a lane, the result in that lane is all bits set to one. If the comparison is false for a lane, all bits are set to zero. The return type is an unsigned integer type. I have written a small piece of code to check this and I observed the result as 0 and -1 instead of 0 and 1. Can any one tell me the reason behind

Interrupt handling on an SMP ARM system with a GIC

 ̄綄美尐妖づ 提交于 2019-12-14 03:43:30
问题 I wanted to know how interrupt handling works from the point any device is interrupted.I know of interrupt handling in bits and pieces and would like to have clear end to end picture of interrupt handing.Let me put across what little I know about interrupt handling. Suppose an FPGA device is interrupted through electrical lines and get some data .Device driver for this FPGA device already had code (Interrupt handler) registered using request_irq function. So now FPGA device have an IRQ line

Cortex-A9 SMP GICC_RPR always be 0, interrupt not triggered

爷,独闯天下 提交于 2019-12-13 05:36:56
问题 Context on i.MX6Quad board, when the system running, I found that Core3 can not deal with any interrupt. view the GIC interface registers by Trace32, the GICC_RPR is always 0, which means the highest priority event is running, so it explain the uppon question: lower priority event cannot be processed. Question I have insert a instruction : write 0 to GICC_EOI , which can change GICC_RPR to idle priority(0xFF), but it doesn't work, keep 0. Goal I want to do priority drop and deactivate success

pairwise addition in neon

瘦欲@ 提交于 2019-12-12 03:35:39
问题 I want to add 00 and 01 indices value of int64x2_t vector in neon . I am not able to find any pairwise-add instruction which will do this functionality . int64x2_t sum_64_2; //I am expecting result should be.. //int64_t result = sum_64_2[0] + sum_64_2[1]; Is there any instruction in neon do to this logic. 回答1: You can write it in two ways. This one explicitly uses the NEON VADD.I64 instruction: int64x1_t f(int64x2_t v) { return vadd_s64 (vget_high_s64 (v), vget_low_s64 (v)); } and the

Enable neon on ARM cortex-a series

萝らか妹 提交于 2019-12-11 14:46:31
问题 I want to initialize on a bare metal cortex A-15 the NEON cp. After following ARM's directives I wrote this sequence at the end of my platform init sequence: MOV r0, #0x00F00000 MRC p15, 0, r0, c1, c1, 2 ORR r0, r0, #0x0C00 BIC r0, r0, #0xC000 MCR p15, 0, r0, c1, c1, 2 ISB MRC p15, 4, r0, c1, c1, 2 BIC r0, r0, #0x0C00 BIC r0, r0, #(3<<14) MCR p15, 4, r0, c1, c1, 2 ISB MOV r3, #0x40000000 VMSR FPEXC, r3 I get this error: Error: operand 0 must be FPSCR -- `vmsr FPEXC,r3' I am using arm-eabi-as