Optimizing ARM Cortex M3 code
I have a C Function which tries to copy a framebuffer to FSMC RAM. The functions eats the frame rate of the game loop to 10FPS. I would like to know how to analyze the disassembled function, should I count each instruction cycle ? I want to know where the CPU spend its time, in which part. I'm sure that the algorithm is also a problem, because its O(N^2) The C Function is: void LCD_Flip() { u8 i,j; LCD_SetCursor(0x00, 0x0000); LCD_WriteRegister(0x0050,0x00);//GRAM horizontal start position LCD_WriteRegister(0x0051,239);//GRAM horizontal end position LCD_WriteRegister(0x0052,0);//Vertical GRAM