Post process `objdump --disassemble` with ARM cycle counts

百般思念 提交于 2019-11-29 08:45:18

There is an online tool which estimates cycle counts on Cortex-A8. However, this CPU is quite old, and programs optimized for it might be suboptimal on newer CPUs.

AFAIK ARM also provides Cortex-A9 and Cortex-A5 cycle-accurate emulators in their RVDS software, but it is quite expensive.

Cycle counts are not something that can be assessed by looking at the instruction alone on a modern high end ARM. There is a lot of runtime state that affects the real world retirement rate of an instruction. Does the data it needs exist in the cache? Does the instruction have any dependencies on previous instruction results? If so, what latencies does the forwarding unit remove? How full is the load/store buffer? What kind of memory mapping is it touching? How full are the processor pipelines that this instruction needs? Are there synchronizing instructions in the stream? Has speculation brought forward some data it depends on? What is the state of the register renamer? Have conditional instructions been filling the pipeline or was the decoder smart enough to skip them completely? What are the ratios between the core clock and the bus and memory clocks? What's the size of the branch prediction table?

Without a full processor simulation all you can get are guesses. Whether those numbers are meaningful to you depends on what you are trying to accomplish with them.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!