Truffle/Graal is also able to do some insanely cool things related to optimizing across FFI boundries: if you have a Java program that uses the Truffle javascript engine for scripting, Truffle is able to do JIT optimization transparently across the FFI boundry so it has 0 overhead. IIRC they even have some special API to allow a Truffle->native C library to be exposed to the runtime in a way that allows for it to optimize away a lot of FFI overhead or inline the native function into the trace. They were advertising this "Polyglot VM" functionality a lot a few years ago, although now their marketing mostly focuses on the NativeImage part (which helps a lot with the slow startup you mention).
TruffleRuby even had the extremely big brain idea of running a Truffle interpreter for C for native extensions instead of actually compiling them to native code, just so that Truffle can optimize transparently across FFI boundaries. https://chrisseaton.com/truffleruby/cext/
I don't have anything to contribute to the Truffle discussion, but for those not familiar: Chris Seaton was an active participant on Hacker News, until his tragic death in late 2022. Wish he was still with us.
> TruffleRuby even had the extremely big brain idea of running a Truffle interpreter for C for native extensions […]
TruffleC was a research project and the first attempt of running C code on Truffle that I'm aware of. It directly interpreted C source code and while that works for small self-contained programs, you quickly run into a lot of problems as soon as you want to run larger real world programs. You need everything including the C library available as pure C code and you have to deal with the fact that a lot of C code uses some UB/IB. In addition, your C parser has to fully adhere to the C standard and once you want to support C++ too because a lot of code is written in C++, you have to re-start from scratch. I don't know if TruffleC was ever released as open source.
The next / current attempt is Sulong which uses LLVM to compile C/C++/Rust/… to LLVM IR ("bitcode") and then directly interprets that bitcode. It's a lot better, because you don't have to write your own complete C/C++/… parser/compiler, but bitcode still has various limitations. Essentially as soon as the program uses handwritten assembler code somewhere, or if it does some low level things like setjmp/longjmp, things get hairy pretty quickly. Bitcode itself is also platform dependent (think of constants/macros/… that get expanded during compilation), you still need all code / libraries in bitcode, every language uses a just so slightly different set of IR nodes and requires a different runtime library so you have to explicitly support them, and even then you can't make it fully memory safe because typical programs will just break. In addition, the optimization level you choose when compiling the source program can result in very different bitcode with very different IR nodes, some of which were not supported for a long time (e.g., everything related to vectorization). Sulong can load libraries and expose them via the Truffle FFI, and it can be used for C extensions in GraalPython and TruffleRuby AFAIK. It's open source [1] and part of GraalVM, so you can play around with it.
Another research project was then to directly interpret AMD64 machine code and emulate a Linux userspace environment, because that would solve all the problems with inline assembly and language compatibility. Although that works, it has an entirely different set of problems: Graal/Truffle is simply not made for this type of code and as a result the performance is significantly worse than Sulong. You also end up re-implementing the Linux syscall interface in your interpreter, you have to deal with all the low level memory features that are available on Linux like mmap/mprotect/... and they have to behave exactly as on a real Linux system, and you can't easily export subroutines via Truffle FFI in a way that they also work with foreign language objects. It does work with various guest languages like C/C++/Rust/Go/… without modifying the interpreter, as long as the program is available as native Linux/AMD64 executable and doesn't use any of the unimplemented features. This project is also available as open source [2], but its focus somewhat shifted to using the interpreter for execution trace based program analysis.
Things that aren't supported by any of these projects AFAIK are full support for multithreading and multiprocessing, full support for IPC, and so on. Sulong partially solves it by calling into the native C library loaded in the VM for subroutines that aren't available as bitcode and aborting on certain unsupported calls like fork/clone, but then you obviously lose the advantage of having everything in the interpreter.
The conclusion is, whatever you try to interpret C/C++/… code, get ready for a world of pain and incompatibilities if you intend to run real world programs.
TruffleRuby even had the extremely big brain idea of running a Truffle interpreter for C for native extensions instead of actually compiling them to native code, just so that Truffle can optimize transparently across FFI boundaries. https://chrisseaton.com/truffleruby/cext/