Optimizing an incrementing ASCII decimal counter in video RAM on 7th gen Intel Core
问题 I'm trying to optimize the following subroutine for a specific Kaby Lake CPU (i5-7300HQ), ideally to make the code at least 10 times faster compared to its original form. The code runs as a floppy-style bootloader in 16-bit real mode. It displays a ten digit decimal counter on screen, counting 0 - 9999999999 and then halting. I have taken a look at Agner's Optimization Guides for Microarchitecture and Assembly, Instruction Performance Table and Intel's Optimization Reference Manual. Only