I had a little too much time on my hands and started wondering if I could write a self-modifying program. To that end, I wrote a \"Hello World\" in C, then used a hex editor
If we're talking about doing this in an x86 environment it shouldn't be impossible. It should be used with caution though because x86 instructions are variable-length. A long instruction may overwrite the following instruction(s) and a shorter one will leave residual data from the overwritten instruction which should be noped (NOP instruction).
When the x86 first became protected the intel reference manuals recommended the following method for debugging access to XO (execute only) areas:
So the answer to the problem is in the last step. The RW is necessary if you want to be able to insert the breakpoint instruction which is what debuggers do. More modern processors than the 80286 have internal debug registers to enable non-intrusive monitoring functionality which could result in a breakpoint being issued.
Windows made available the building blocks for doing this starting with Win16. They are probably still in place. I think Microsoft calls this class of pointer manipulation "thunking."
I once wrote a very fast 16-bit database engine in PL/M-86 for DOS. When Windows 3.1 arrived (running on 80386s) I ported it to the Win16 environment. I wanted to make use of the 32-bit memory available but there was no PL/M-32 available (or Win32 for that matter).
to solve the problem my program used thunking in the following way
Once the mechanism was bug free it worked without a hitch. The largest memory areas my program used were 2304*2304 double precision which comes out to around 40MB. Even today, I would call this a "large" block of memory. In 1995 it was 30% of a typical SDRAM stick (128 MB PC100).
Self-modifying code is used for modifications in memory, not in file (like run-time unpackers as UPX do). Also, the file representation of a program is more difficult to operate because of relative virtual addresses, possible relocations and modifications to the headers needed for most updates (eg. by changing the Hello world!
to longer Hello World
you'll need to extend the data segment in file).
I'll suggest that you first learn to do it in memory. For file updates the simplest and more generic approach would be running a copy of the program so that it would modify the original.
EDIT: And don't forget about the main reasons the self-modifying code is used:
1) Obfuscation, so that the code that is actually executed isn't the code you'll see with simple statical analysis of the file.
2) Performance, something like JIT.
None of them benefits from modifying the executable.
If you are using Windows, you can do the following:
Step-by-Step Example:
VirtualProtect()
on the code pages you want to modify, with the PAGE_WRITECOPY
protection.VirtualProtect()
on the modified code pages, with the PAGE_EXECUTE
protection.FlushInstructionCache()
.For more information, see How to Modify Executable Code in Memory (Archived: Aug. 2010)