I\'d like to build a C pre-processor / compiler that allows functions to be collected from local and online sources. ie:
#fetch MP3FileBuilder http://scripts
This isn't trivial, but it's not that hard.
You can run binary code in a sand box. Every operating system does this all day long.
They're going to have to use your standard library (vs a generic C lib). Your standard library will enforce whatever controls you want to impose.
Next, you'll want ensure that they can not create "runnable code" at run time. That is, the stack isn't executable, they can't allocate any memory that's executable, etc. That means that only the code generated by the compiler (YOUR compiler) will be executable.
If your compiler signs its executable cryptographically, your runtime will be able to detect tampered binaries, and simply not load them. This prevents them from "poking" things in to the binaries that you simply don't want them to have.
With a controlled compiler generating "safe" code, and a controlled system library, that should give a reasonably controlled sandbox, even with actual machine language code.
Want to impose memory limits? Put a check in to malloc. Want to restrict how much stack is allocated? Limit the stack segment.
Operating systems create these kinds of constrained environments using their Virtual Memory managers all day long, so you can readily do these things on modern OS's.
Whether the effort to do this is worthwhile vs using an off the shelf Virtual Machine and byte code runtime, I can't say.
I haven't investigated this very closely, but the guys working on Chromium (aka Google Chrome) are working on a sandbox almost like this already, which might be worth looking into.
http://dev.chromium.org/developers/design-documents/sandbox/Sandbox-FAQ
It's open source, so should be possible to use it.
8 years later and I've discovered a new platform that meets all of my original requirements. Web Assembly allows you to run a C/C++ subset safely inside a browser and comes with similar safety restrictions to my requirements such as restricting memory access and preventing unsafe operations on the OS and parent process. It's been implemented in Firefox 52 and there are promising signs other browsers will support it in the future.
Nice idea, but I'm fairly sure what you're trying to do is impossible with C or C++. If you dropped the sandbox idea it might work.
Java's already got a similar (as in a large library of 3rd party code) system in Maven2
Liran pointed out codepad.org in a comment above. It isn't suitable because it relies on a very heavy environment (consisting of ptrace, chroot, and an outbound firewall) however I found there a few g++ safety switches which I thought I'd share here:
gcc 4.1.2 flags: -O -fmessage-length=0 -fno-merge-constants -fstrict-aliasing -fstack-protector-all
g++ 4.1.2 flags: -O -std=c++98 -pedantic-errors -Wfatal-errors -Werror -Wall -Wextra -Wno-missing-field-initializers -Wwrite-strings -Wno-deprecated -Wno-unused -Wno-non-virtual-dtor -Wno-variadic-macros -fmessage-length=0 -ftemplate-depth-128 -fno-merge-constants -fno-nonansi-builtins -fno-gnu-keywords -fno-elide-constructors -fstrict-aliasing -fstack-protector-all -Winvalid-pch
The options are explained in the GCC manual
What really caught my eye was the stack-protector flag. I believe it is a merge of this IBM research project (Stack-Smashing Protector) with the official GCC.
The protection is realized by buffer overflow detection and the variable reordering feature to avoid the corruption of pointers. The basic idea of buffer overflow detection comes from StackGuard system.
The novel features are (1) the reordering of local variables to place buffers after pointers to avoid the corruption of pointers that could be used to further corrupt arbitrary memory locations, (2) the copying of pointers in function arguments to an area preceding local variable buffers to prevent the corruption of pointers that could be used to further corrupt arbitrary memory locations, and the (3) omission of instrumentation code from some functions to decrease the performance overhead.
If I were going to do this, I would investigate one of two approaches:
However, I agree with others that this is probably a horribly involved project. Look at the problems that web browsers have had with buggy or hung plugins destabilizing the entire browser. Or look at the release notes for the Wireshark project; almost every release, it seems, contains security fixes for problems in one of its protocol dissectors that then affect the entire program. If a C/C++ sandbox were feasible, I'd expect these projects to have latched onto one by now.