Building a truly safe Python sandbox (well, in CPython at least) is widely considered a fool's errand [0]. However, a Python sandbox can be relatively safely done in two ways:
1. By completely reimplementing Python at a lower level and intercepting system calls, like PyPy's sandbox feature [1].
2. By utilizing other, more mature OS level constructs (IE: think containerization like LXC (which is imperfect), stacking OS features like seccomp/chroot/etc. or better, true virtualization like Xen).
Ideally, you'd combine them both and then run them on a "dumb box" which is just a REST API with no other keys or system access. The key to designing moderately secure sandboxes is just accepting that there are angles of attack you haven't considered yet, so design in layers.
If you find yourself inspecting code to decide if it is safe, you are fighting a losing battle... I'd love to hear of any success stories here though!
Great detailed write up of the pitfalls @op, thanks.
The approach there is to rewrite your Python code to disallow any potentially bad operation. E.g. your code cannot access identifiers starting with _ and the list of builtins is restricted (no "open" by default obviously, but also no "type"!)
Any sandbox code doing X.Y is rewritten to _your_getattr(X, "Y") which can decide whether to allow access or not at runtime.
The main thing to be careful of, is not accidentally injecting any callables like "file" into there, as just calling an object does not undergo a security check (given that you might want to pass some data INTO your sandbox).
This does not limit resource usage however, i.e. you can still try to allocate memory/use up CPU time.
> Restricted code doesn't actually protect against malicious users. It's only a best-effort attempt against introducing accidental security issues for semi-professional developers.
BTW, another attack vector (besides resource use) is to use code that causes Python itself to crash. For example, "{[{}]}".format({"{}": 5}) will segfault Python 3.3 (see http://bugs.python.org/issue17644 ).
The harder question is, are any of the ways to segfault Python exploitable?
I wrote a very locked down python "expression evaluator" library ( https://github.com/danthedeckie/simpleeval ) which uses code inspection and is reasonably safe (caveats listed in the README).
Any comments or loopholes found would be appreciated...
> If you find yourself inspecting code to decide if it is safe, you are fighting a losing battle... I'd love to hear of any success stories here though!
NaCL, seL4 and the original proof-carrying code examples of packet filter are inspecting code, but on a very rigorous level.
NaCL is a good example that shows how inspecting code is not actually the mistake that was made by the Python sandbox that was attacked in the article. The mistake was that the inspection was based on a blacklist rather than a whitelist.
Sandboxing based on blacklists is a recipe for disaster, and it's an unfortunate accident of history that the default Python implementation (CPython) makes whitelisting practically impossible. Luckily, there are alternatives as others in the thread have pointed out.
Yup. I remember working on q game engine many years ago and we wanted to use python as a scripting language and found the trail of broken dreams that is the various attempts to sandbox Python.
Just use Lua if you need a sandboxed dynamically-typed environment.
Of course, you make all kinds of sacrifices using ZeroVM. Nevertheless, it's pretty robust and lightweight.
I managed to get Python-on-ZeroVM running on Docker (ie, LXC) which gave me a pretty decently safe platform for running arbitrary code. It's atrophied since it was working, but I think it had some real potential.
#2 alone - running items in an LXC jail without root access is far - is quite secure (and what we relied on at PiCloud to set up our python sandboxes).
If you still insist on trying to build safe-code inspectors, be very skeptical of what your protection is capable of. For instance, why even bother blocking static strings? Such trivially-breakable "security" being placed in your code only serves to distract you and give false confidence.
As a rule, an LXC jail is definitely very much better than code-inspection, but it is worth taking the time reading up on some of the rather specific configuration needed to tighten up LXC/namespaces. Docker (previously DotCloud) obviously has a lot riding on this, so they are taking secure-by-default configuration pretty seriously. [0]
I'm wondering if it's possible to get a simpler/better attempt at sandboxing by just inspecting the generated bytecode, instead of focusing on the .py text file..
There's an in-between step, the "AST" (Abstract Syntax Tree), which is a better place to do this stuff than in the actual byte code. You still have context and a lot more useful information at that point, and can re-write the AST to use workarounds, etc.
Generally speaking it's the data associated with the calls that is risky, not the bytecode itself.
The way the JVM works is by having specific permission checks at points within the code, and they are enforced by the JVM. I think the CLR works in a similar fashion.
You can run Python on top of the JVM and take advantage of this, but I think the version of Python (JYthon) is pretty old.
Very interesting read, I did something similar[1] on a Microsoft site. I didn't think of creating a code object from scratch though, great idea.
I did a web app test once for an application created by IBM. They offered the ability for administrators to run Python code in a 'restricted' sandbox to manipulate data. The app was Java so it was run through Jython, and they locked down all the usual suspects like open(), file() etc. But because it was Jython you could just use the Java io.* classes and bypass all their restrictions.
I was able to connect to any database from a multi-tenant ERP written in Python. The idea was to get hold of the class of the connection object, and re-instanciate it, while passing a new database name in parameter. That ERP lets users (of different tenants) write Python code in various places. The kind of code you could write was already restricted but not sufficiently. Now all the dunder are forbidden, things like getattr() are also forbidden, which makes the trick used in that blog post not possible. One of the funny thing I had to do was to use lambdas to name intermediate values, because statements were prohibited.
1. By completely reimplementing Python at a lower level and intercepting system calls, like PyPy's sandbox feature [1].
2. By utilizing other, more mature OS level constructs (IE: think containerization like LXC (which is imperfect), stacking OS features like seccomp/chroot/etc. or better, true virtualization like Xen).
Ideally, you'd combine them both and then run them on a "dumb box" which is just a REST API with no other keys or system access. The key to designing moderately secure sandboxes is just accepting that there are angles of attack you haven't considered yet, so design in layers.
If you find yourself inspecting code to decide if it is safe, you are fighting a losing battle... I'd love to hear of any success stories here though!
Great detailed write up of the pitfalls @op, thanks.
[0] https://github.com/haypo/pysandbox/ or https://lwn.net/Articles/574215/ [1] http://pypy.readthedocs.org/en/latest/sandbox.html