Enoki: Structured vectorization and differentiation on modern processors

mindv0rtex · on Sept 28, 2019

Another great library from the author of pybind11. I was blown away (as a non CS person) that they approached GPU arrays by bypassing CUDA altogether and generating PTX on the fly, followed by JIT compiling it. It could be a trivial concept for more experienced folks here, but it did expand my feeble mind. I was even thinking about trying such an approach in Rust, which I'm learning at the moment.

pixelpoet · on Sept 28, 2019

The author (Wenzel Jakob, also here on HN) is an absolute Legend, and coauthor of the famous 2nd bible of physically based rendering: http://www.pbr-book.org/

He also has his Mitsuba Renderer (for which this library was created), Nori UI library, ... guy is a machine, and extremely personable / friendly in person.

namibj · on Sept 29, 2019

I hope to succeed in marrying his Field-Aligned Remeshing technique to surface refinement mesh-extraction from depth-maps. It's _so_ good that I honestly believe it worth the (significant) effort.

speierers · on Sept 29, 2019

He's actually the coauthor of the 3rd version of that book.

Also his GUI library is called NanoGUI. Nori is an educational rendering framework.

pixelpoet · on Sept 29, 2019

Whoopsie, guy has too many libraries ;) And yeah I got the 1st ed of PBRT immediately in 2004, have both his and Matt Pharr's autographs in it.

nestorD · on Sept 29, 2019

Do not hesitate to communicate on your project if you ever do that (maybe here: http://www.arewelearningyet.com/gpu-computing/), there is clearly space for a great GPU library in Rust.

joe_the_user · on Sept 28, 2019

This looks pretty amazing.

It seems like it's basically a library for program transformation - more or less two kinds of transformation, parallelization/vectorization and automatic differentiation.

The question I would have is; suppose you have some large program which gets transformed into a system that pipes vectors from location to location, doing operations, how do you deal with issues of data/memory divergence [1]. Basically, how to do you tune the system the data you are using stays in the cache - without consider cache, many advantages of a GPU can be lost. These messy issues tend appear whenever one engages in code generation. Purely piece vectorizations don't have the problem but anything where you're partially reducing and such could have the problem.

[1] (Best quick reference I could Google up quickly on what data divergence is) https://etd.auburn.edu/handle/10415/4688