Bypassing a Python sandbox by abusing code objects

bryanh · on Sept 7, 2014

Building a truly safe Python sandbox (well, in CPython at least) is widely considered a fool's errand [0]. However, a Python sandbox can be relatively safely done in two ways:

1. By completely reimplementing Python at a lower level and intercepting system calls, like PyPy's sandbox feature [1].

2. By utilizing other, more mature OS level constructs (IE: think containerization like LXC (which is imperfect), stacking OS features like seccomp/chroot/etc. or better, true virtualization like Xen).

Ideally, you'd combine them both and then run them on a "dumb box" which is just a REST API with no other keys or system access. The key to designing moderately secure sandboxes is just accepting that there are angles of attack you haven't considered yet, so design in layers.

If you find yourself inspecting code to decide if it is safe, you are fighting a losing battle... I'd love to hear of any success stories here though!

Great detailed write up of the pitfalls @op, thanks.

[0] https://github.com/haypo/pysandbox/ or https://lwn.net/Articles/574215/ [1] http://pypy.readthedocs.org/en/latest/sandbox.html

Erwin · on Sept 7, 2014

As far as I know, Zope's sandbox has not been broken: https://pypi.python.org/pypi/RestrictedPython

The approach there is to rewrite your Python code to disallow any potentially bad operation. E.g. your code cannot access identifiers starting with _ and the list of builtins is restricted (no "open" by default obviously, but also no "type"!)

Any sandbox code doing X.Y is rewritten to _your_getattr(X, "Y") which can decide whether to allow access or not at runtime.

The main thing to be careful of, is not accidentally injecting any callables like "file" into there, as just calling an object does not undergo a security check (given that you might want to pass some data INTO your sandbox).

This does not limit resource usage however, i.e. you can still try to allocate memory/use up CPU time.

dalke · on Sept 7, 2014

Quoting from the bug page for "Zope sandbox escape via SecureModuleImporter from Products/PageTemplates/ZRPythonExpr.py" at https://bugs.launchpad.net/zope2/+bug/1047318 :

> Restricted code doesn't actually protect against malicious users. It's only a best-effort attempt against introducing accidental security issues for semi-professional developers.

BTW, another attack vector (besides resource use) is to use code that causes Python itself to crash. For example, "{[{}]}".format({"{}": 5}) will segfault Python 3.3 (see http://bugs.python.org/issue17644 ).

The harder question is, are any of the ways to segfault Python exploitable?

nl · on Sept 7, 2014

Note that the documentation contains the following:

"RestrictedPython was bad idea and mostly causes headache. Avoid through-the-web Zope scripts if possible."[1]

[1]http://docs.plone.org/develop/plone/security/sandboxing.html

deckiedan · on Sept 7, 2014

I wrote a very locked down python "expression evaluator" library ( https://github.com/danthedeckie/simpleeval ) which uses code inspection and is reasonably safe (caveats listed in the README).

Any comments or loopholes found would be appreciated...

eru · on Sept 7, 2014

> If you find yourself inspecting code to decide if it is safe, you are fighting a losing battle... I'd love to hear of any success stories here though!

NaCL, seL4 and the original proof-carrying code examples of packet filter are inspecting code, but on a very rigorous level.

nhaehnle · on Sept 7, 2014

NaCL is a good example that shows how inspecting code is not actually the mistake that was made by the Python sandbox that was attacked in the article. The mistake was that the inspection was based on a blacklist rather than a whitelist.

Sandboxing based on blacklists is a recipe for disaster, and it's an unfortunate accident of history that the default Python implementation (CPython) makes whitelisting practically impossible. Luckily, there are alternatives as others in the thread have pointed out.

Pxtl · on Sept 7, 2014

Yup. I remember working on q game engine many years ago and we wanted to use python as a scripting language and found the trail of broken dreams that is the various attempts to sandbox Python.

Just use Lua if you need a sandboxed dynamically-typed environment.

nl · on Sept 7, 2014

Python on ZeroVM is pretty safe[1].

Of course, you make all kinds of sacrifices using ZeroVM. Nevertheless, it's pretty robust and lightweight.

I managed to get Python-on-ZeroVM running on Docker (ie, LXC) which gave me a pretty decently safe platform for running arbitrary code. It's atrophied since it was working, but I think it had some real potential.

[1] https://github.com/zerovm/zpython2

usaar333 · on Sept 7, 2014

#2 alone - running items in an LXC jail without root access is far - is quite secure (and what we relied on at PiCloud to set up our python sandboxes).

If you still insist on trying to build safe-code inspectors, be very skeptical of what your protection is capable of. For instance, why even bother blocking static strings? Such trivially-breakable "security" being placed in your code only serves to distract you and give false confidence.

bryanh · on Sept 7, 2014

As a rule, an LXC jail is definitely very much better than code-inspection, but it is worth taking the time reading up on some of the rather specific configuration needed to tighten up LXC/namespaces. Docker (previously DotCloud) obviously has a lot riding on this, so they are taking secure-by-default configuration pretty seriously. [0]

[0] https://docs.docker.com/articles/security/

xorcist · on Sept 7, 2014

What about the Python sandbox on App Engine?

I believe it is fairly secure, but I don't know if it's comparable to the ones you mentioned.

nl · on Sept 7, 2014

That's implemented on top of Linux Containers.

zzleeper · on Sept 7, 2014

I'm wondering if it's possible to get a simpler/better attempt at sandboxing by just inspecting the generated bytecode, instead of focusing on the .py text file..

deckiedan · on Sept 7, 2014

There's an in-between step, the "AST" (Abstract Syntax Tree), which is a better place to do this stuff than in the actual byte code. You still have context and a lot more useful information at that point, and can re-write the AST to use workarounds, etc.

That's the approach I took with https://github.com/danthedeckie/simpleeval .

mrfusion · on Sept 7, 2014

That's a really good point. I'm eager to hear a response.

It certainly seems like you could just ban a bunch of byte codes and easily detect them. I must be missing something.

nl · on Sept 7, 2014

Generally speaking it's the data associated with the calls that is risky, not the bytecode itself.

The way the JVM works is by having specific permission checks at points within the code, and they are enforced by the JVM. I think the CLR works in a similar fashion.

You can run Python on top of the JVM and take advantage of this, but I think the version of Python (JYthon) is pretty old.

orf · on Sept 7, 2014

Very interesting read, I did something similar[1] on a Microsoft site. I didn't think of creating a code object from scratch though, great idea.

I did a web app test once for an application created by IBM. They offered the ability for administrators to run Python code in a 'restricted' sandbox to manipulate data. The app was Java so it was run through Jython, and they locked down all the usual suspects like open(), file() etc. But because it was Jython you could just use the Java io.* classes and bypass all their restrictions.

1. http://tomforb.es/breaking-out-of-secured-python-environment...

thu · on Sept 7, 2014

I was able to connect to any database from a multi-tenant ERP written in Python. The idea was to get hold of the class of the connection object, and re-instanciate it, while passing a new database name in parameter. That ERP lets users (of different tenants) write Python code in various places. The kind of code you could write was already restricted but not sufficiently. Now all the dunder are forbidden, things like getattr() are also forbidden, which makes the trick used in that blog post not possible. One of the funny thing I had to do was to use lambdas to name intermediate values, because statements were prohibited.

bobbyi_settv · on Sept 7, 2014

> However, that requires accessing the "func_code" member, which is explicitly blocked.

But you've already shown that you can build the string "func_code" using your xor code. So you can access the member using getattr/ setattr:

getattr(func, str_containing_func_code)

setattr(func, str_containing_func_code, new_code)