I have a C Function which tries to copy a framebuffer to FSMC RAM.
The functions eats the frame rate of the game loop to 10FPS. I would like to know how to analyze
You should start by compiling the C code with speed optimizations enabled. The disassembled code you provide appears to be storing the i and j counters on the stack, which adds 3 load/store operations to the inner loop. You might also want to inline LCD_WriteData in the inner loop.
On the other hand, if you are really writing to the LCD in the inner loop then the performance may be limited by that interface.