I would recommend LUA (or eLUA http://www.eluaproject.net/ ). I've "ported" LUA to a Cortex-M3 a while back. From the top of my head it had a flash size of 60~100KB and needed about 20KB RAM to run. I did strip down to the bare essentials, but depending on your application, that might be enough. There's still room for optimization, especially about RAM requirements, but I doubt you can run it comfortable in 8KB.