Interactive GPU Programming, Part 1: Hello CUDA

Everlag · on Jan 26, 2018

From the perspective of a CUDA beginner, this doesn't seem simpler than writing CUDA with C(not C++, just C). If you're going to pick up CUDA, starting with C means you get the best tooling support and community docs. Not to mention that managing pointers and explicit types in C will genuinely help your understanding of how CPU-GPU works.

If you already know Clojure, this is probably the best chance to extend something you already love using. If you don't, you're probably better off learning either CUDA or Clojure rather than both at the same time. Debugging CUDA errors are already painful, I wouldn't add a new host language on top of that.

For context, I'm currently taking my school's GPGPU course. We've just started actually writing non-trivial code.

jplane · on Jan 26, 2018

The development feedback loop is incredibly tight when using a clojure (or lisp in general) repl. The interactivity lets you interactively develop your code, including ( it appears) the C/cuda code since you can call out to compile it at the repl and then upload it to the GPU for execution.

dragandj · on Jan 26, 2018

Note the word Interactive. That's not something you'll get in the plain C tutorial.

moocowtruck · on Jan 26, 2018

you should give it a shot, interactively working with stuff is an excellent learning experience; my 11 year old daughter is having a great time, she might be a bit stunned if i said "here now you have to do it like this in C"

frozenport · on Jan 27, 2018

CUDA is C++, and benefits greatly from templates as they let you optimize work per thread. Many years ago I tried to add GPU support to a C code and was surprised to learn the flag that did C compilation didn't work!

tlarkworthy · on Jan 27, 2018

I was irritated that it took 8 paragraphs to reveal it was clojure. And it was extra annoying that the vagueness was entirely deliberate.

dragandj · on Jan 26, 2018

The obligatory links to the GitHub of the libraries used: https://github.com/uncomplicate

shmerl · on Jan 26, 2018

I hope CUDA will get replaced by Vulkan merged with whatever core OpenCL features it still needs.

Looks like Khronos are looking into converging them in some way: https://www.pcper.com/reviews/Graphics-Cards/Follow-Neil-Tre...

CUDA unfortuantely is Nvidia's lock-in, so not a good way forward.

dragandj · on Jan 26, 2018

FWIW, all my libraries work with both CUDA and OpenCL.

While I agree with your sentiment, unfortunately Nvidia is the only vendor that pays considerable number of people to develop the ecosystem. AMD basically says "get lost" by refusing to put more than a handful of people on the job of providing OpenCL libraries. And, BTW, they change their minds every few years. I hope that HIP won't be abandonware...

shmerl · on Jan 26, 2018

> unfortunately Nvidia is the only vendor that pays considerable number of people to develop the ecosystem. AMD basically says "get lost" by refusing to put more than a handful of people on the job of providing OpenCL libraries.

Vulkan itself is developed and supported well, and it already can be used for compute as far as I know. But apparently there are some features that come from the OpenCL world that need to be filled in. It wouldn't be AMD's exclusive effort. So hopefully things will start moving.

dragandj · on Jan 26, 2018

The language and basic platform is not a problem. OpenCL was and is OK. However, the libraries are far and between. CUDA offers cuBLAS, cuFFT, cuDNN, cuSolve, etc. For OpenCL, even the decent BLAS library (CLBlast) had to be written by a guy who did it for free, while AMD's clBLAS is more or less stalled (and I never managed to build it on Linux in the first place), and that's it...

ryanpepper · on Jan 27, 2018

The ability just to swap in the cuFFTW header for FFTW3's making calls execute on GPU (even though it doesn't give the best performance) is also nice for beginners.

pjmlp · on Jan 27, 2018

Khronos is to blame by having OpenCL be a C only game, while CUDA was C, C++ and Fortran with PTX for anyone else that wanted to write a compiler fronted for CUDA.

It took them being beaten by NVidia to actually care to add SPIR and C++ support to OpenCL.

Even now, while CUDA brings C++ compiler out of the box with their SDK, for OpenCL one needs to go to Codeplay and download their ComputeCpp Community edition compiler for SYSCL support, that might or not, support a given card. Hardly any better.

jweather · on Jan 26, 2018

Followed along until I had to compile a kernel, now I'm facing a java.lang.UnsatisfiedLinkError: Error while loading native library "JNvrtc-0.9.0-windows-x86_64" This seems to be a dependency of ClojureCUDA, but I don't see anything about it in their installation instructions. I have the CUDA Toolkit installed. Everything worked up to this point.

Aardwolf · on Jan 26, 2018

Why do things using CUDA often require older versions of compilers?

E.g. ccminer. Try to make it, but it finds my modern gcc or clang too modern :(

bilog · on Jan 27, 2018

CUDA uses a single-source approach, meaning that the host (CPU) and device (GPU) code are in the same file. So it requires a special compiler (nvcc) that splits the original source files, compiles the host and device parts separately, and then merges the result back together.

This requires nvcc and the device compiler to have exact knowledge of how the host compiler compiles every single construct (thing e.g. about alignment and padding in complex structures), and they must at least be able to parse the syntax of the host include files (which e.g. fails if the include files have C++11 syntax, but the device compiler only knows how to parse C++98).

shaklee3 · on Jan 27, 2018

The cuda compiler itself (nvcc) is far behind the features of more recent compilers. For instance, c++11 is supported, but not the full standard. It will take a while before 14/17 are supported.

Aardwolf · on Jan 27, 2018

how does that stop it from using the latest version of clang++ or g++? they are backwards compatible with older C++ versions. The context is linux and a makefile failing with a message that your g++ or clang++ must be a version older than something

splittingTimes · on Jan 26, 2018

Are there similar resources/tutorials for GPU programming/CUDA/openCL for pure Java?

agibsonccc · on Jan 27, 2018

(Disclaimer: I created and maintain this library which is now apart of the eclipse foundation):

http://nd4j.org/ - in built GPU garbage collector and everything.

If you want raw cuda primitives (not generally recommendended and hard to do right) - you can take a look at our javacpp based (we also maintain this) cuda bindings: https://github.com/bytedeco/javacpp-presets/tree/master/cuda

Unlike jcuda (which people typically recommend despite not being updated as often) we actually depend on this for the nd4j and deeplearning4j projects.

These cuda bindings are meant to be a 1 to 1 mapping to the cuda api as well. Hope this helps!

If you want a fairly small and minimalistic look at the underlying c code which uses cuda take a look at: https://github.com/deeplearning4j/libnd4j

All of this is published on maven central for you and runs on linux, windows and even mac. It's also the same api. All you do is switch the backend.

splittingTimes · on Jan 31, 2018

Thats great, thanks.

limaoscarjuliet · on Jan 26, 2018

In Polish, "cuda" means "miracles". Always cracks me up!

programmer_dude · on Jan 27, 2018

In Hindi it means garbage (not trying to be a jerk here, this is what it means).

calebm · on Jan 26, 2018

I'd love something like this with Python and ctypes.

deepnotderp · on Jan 26, 2018

PyCUDA is more of a wrapper type, performance won't be good compared to native cuda.

quantumhobbit · on Jan 26, 2018

There are pyopencl and pycuda. Not sure what the best tutorials would be though.

kalefranz · on Jan 27, 2018

Google around a bit for “numba gpu”

dragonwriter · on Jan 26, 2018

Okay, so it's easier then directly using the the CUDA, etc., C toolchains, perhaps, but why not compare to Python + Numba, which has been available with GPU support for quite a while, and likewise avoids direct exposure to the underlying C toolchains, provides interactive compilation, can be used with a nice REPL (or, Jupyter Notebook), etc.?

metalock · on Jan 26, 2018

The author wrote the tutorial as he pleased. Nobody stops you from submitting a better link.