cortex-a8

创龙基于TI AM335x ARM Cortex-A8 CPU,主频高达1GHz开发板CAN总线接口、RTC座

被刻印的时光 ゝ 提交于 2020-02-25 21:37:28
TL335x-EVM是由广州创龙基于TI ARM Cortex-A8而设计的工业级开发板。它为用户提供了SOM-TL335x核心板的测试平台,用于快速评估SOM-TL335x核心板的整体性能。 TL335x-EVM底板采用沉金无铅工艺的4层板设计,不仅为客户提供丰富的AM335x入门教程以及Demo程序,还协助客户进行底板的开发,提供长期、全面的技术支持,帮助客户以最快的速度进行产品的二次开发,实现产品的快速上市。 CAN 总线接口 开发板搭载有一个CAN总线接口CAN。CON9为对应接线端子,接口定义如下图: RTC 座 芯片内部自带RTC时钟控制器,通过可充电ML2032型RTC座引出接口,电压值为3V,其接口为CON3。其硬件位置及原理图如下图所示: 来源: CSDN 作者: Tronlong_ 链接: https://blog.csdn.net/Tronlong_/article/details/104496879

Using ARM NEON intrinsics to add alpha and permute

醉酒当歌 提交于 2020-02-01 04:34:25
问题 I'm developing an iOS app that needs to convert images from RGB -> BGRA fairly quickly. I would like to use NEON intrinsics if possible. Is there a faster way than simply assigning the components? void neonPermuteRGBtoBGRA(unsigned char* src, unsigned char* dst, int numPix) { numPix /= 8; //process 8 pixels at a time uint8x8_t alpha = vdup_n_u8 (0xff); for (int i=0; i<numPix; i++) { uint8x8x3_t rgb = vld3_u8 (src); uint8x8x4_t bgra; bgra.val[0] = rgb.val[2]; //these lines are slow bgra.val[1]

创龙基于TI AM335x ARM Cortex-A8 CPU,主频高达1GHz开发板按键、串口

浪子不回头ぞ 提交于 2020-01-18 06:47:04
处理器 TI Sitara AM335x是一款高性能嵌入式32位工业级Cortex-A8处理器,主频可高达1GHz,运算能力可高达2000DMIPS,搭配DDR3,兼容eMMC和NAND FLASH,拥有多种工业接口资源,以下是AM335x CPU资源框图: 按键 5个按键包含1个复位按键KEY1,1个长按睡眠按键KEY2,1个唤醒按键KEY3,2个可编程输入按键(含1个非屏蔽中断按键)KEY4和KEY5。其硬件位置及原理图如下图所示: 串口 开发板上共引出了3个串口,分别为CON4、CON6、CON8,其中CON4为Micro USB调试串口(UART3)、CON6为RS232串口(UART0)、CON8为RS485串口(UART1)。其硬件位置及原理图如下图所示: 表 1 串口名称 开发板位置 串口说明 Micro USB CON4 通过CH340芯片转成Micro USB接口 RS232 CON6 通过SP3232EEY-L/TR串口电平转换芯片转成RS232串口,使用9针DB9接口 RS485 CON8 使用3位接线端子 来源: CSDN 作者: Tronlong_ 链接: https://blog.csdn.net/Tronlong_/article/details/103976319

ARM and NEON can work in parallel?

谁都会走 提交于 2020-01-09 09:16:06
问题 This is with reference to question: Checksum code implementation for Neon in Intrinsics Opening the sub-questions listed in the link as separate individual questions. As multi questions aren't to be asked as a part of single thread. Anyway coming to the question: Can ARM and NEON (speaking in terms of arm cortex-a8 architecture) actually work in parallel? How can I achieve this? Could someone point to me or share some sample implementations(pseudo-code/algorithms/code, not the theoretical

Profling on arm Cortex_A8

别等时光非礼了梦想. 提交于 2019-12-25 03:53:31
问题 I want to do profiling for my application on ARM processor. I found the oprofile doesn't work. Someone used the following code to test a few years ago. the cyclic counter does work, the performance monitor counter still doesn't work. I tested it again, it is same. For following code, I got cycle count: 2109, performance monitor count: 0. I have searched by google, so far, I have not found a solution. Has someone fixed this issue? uint32_t value = 0 uint32_t count = 0; struct timeval tv;

Write directly to the global history buffer (GHB) or BTB in the branch predictor of a ARM Cortex A8?

烈酒焚心 提交于 2019-12-24 00:22:50
问题 I'm interested in tinkering directly with the contents of the BTB (branch target buffer) and GHB on the Cortex A8. The ARM manual says stuff like: To write one entry in the instruction side GHB array, for example: LDR R0, =0x3333AAAA; MCR p15, 0, R0, c15, c1, 0; Move R0 to I-L1 Data 0 Register LDR R1, =0x0000020C; MCR p15, 0, R1, c15, c5, 2; Write I-L1 Data 0 Register to GHB To read one entry in the instruction side GHB array, for example: LDR R1, =0x0000020C; MCR p15, 0, R1, c15, c7, 2; Read

Some doubts in optimizing the neon code

给你一囗甜甜゛ 提交于 2019-12-23 05:44:17
问题 I wrote some neon code in assembly and was aiming for maximum optimization. Though the numbers seem satisfactory, I was interested in understanding the possibilities of optimizing it further. Then I came across an online tool which helps in counting the cycles of each instruction. Here goes the link to my code: http://pulsar.webshaker.net/ccc/sample-115d4c29 It clearly marked the areas of my concern, but I could not clearly understand the reason for those statements to contain the overheads.

How to get call graph profiling working with gcc compiled code and ARM Cortex A8 target?

丶灬走出姿态 提交于 2019-12-21 18:00:30
问题 I am biting my teeth out on this one... I need to do profiling on an ARM board and need to view call graphs. I tried with OProfile, Kernel perf and Google performance tools. All work fine but do not output any call-graph information. This led me to the conclusion that I am not compiling my code correctly. I use the following flags when compiling my C++ code: Arch specific: -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=vfpv3 General: -fexceptions -fno-strict-aliasing -D_REENTRANT

Why ARM NEON not faster than plain C++?

旧城冷巷雨未停 提交于 2019-12-18 09:56:17
问题 Here is a C++ code: #define ARR_SIZE_TEST ( 8 * 1024 * 1024 ) void cpp_tst_add( unsigned* x, unsigned* y ) { for ( register int i = 0; i < ARR_SIZE_TEST; ++i ) { x[ i ] = x[ i ] + y[ i ]; } } Here is a neon version: void neon_assm_tst_add( unsigned* x, unsigned* y ) { register unsigned i = ARR_SIZE_TEST >> 2; __asm__ __volatile__ ( ".loop1: \n\t" "vld1.32 {q0}, [%[x]] \n\t" "vld1.32 {q1}, [%[y]]! \n\t" "vadd.i32 q0 ,q0, q1 \n\t" "vst1.32 {q0}, [%[x]]! \n\t" "subs %[i], %[i], $1 \n\t" "bne

How does one do integer (signed or unsigned) division on ARM?

喜夏-厌秋 提交于 2019-12-17 11:43:59
问题 I'm working on Cortex-A8 and Cortex-A9 in particular. I know that some architectures don't come with integer division, but what is the best way to do it other than convert to float, divide, convert to integer? Or is that indeed the best solution? Cheers! = ) 回答1: The compiler normally includes a divide in its library, gcclib for example I have extracted them from gcc and use them directly: https://github.com/dwelch67/stm32vld/ then stm32f4d/adventure/gcclib going to float and back is probably