From the perspective of a CUDA beginner, this doesn't seem simpler than writing CUDA with C(not C++, just C). If you're going to pick up CUDA, starting with C means you get the best tooling support and community docs. Not to mention that managing pointers and explicit types in C will genuinely help your understanding of how CPU-GPU works.
If you already know Clojure, this is probably the best chance to extend something you already love using. If you don't, you're probably better off learning either CUDA or Clojure rather than both at the same time. Debugging CUDA errors are already painful, I wouldn't add a new host language on top of that.
For context, I'm currently taking my school's GPGPU course. We've just started actually writing non-trivial code.
The development feedback loop is incredibly tight when using a clojure (or lisp in general) repl. The interactivity lets you interactively develop your code, including ( it appears) the C/cuda code since you can call out to compile it at the repl and then upload it to the GPU for execution.
you should give it a shot, interactively working with stuff is an excellent learning experience; my 11 year old daughter is having a great time, she might be a bit stunned if i said "here now you have to do it like this in C"
CUDA is C++, and benefits greatly from templates as they let you optimize work per thread. Many years ago I tried to add GPU support to a C code and was surprised to learn the flag that did C compilation didn't work!
FWIW, all my libraries work with both CUDA and OpenCL.
While I agree with your sentiment, unfortunately Nvidia is the only vendor that pays considerable number of people to develop the ecosystem. AMD basically says "get lost" by refusing to put more than a handful of people on the job of providing OpenCL libraries. And, BTW, they change their minds every few years. I hope that HIP won't be abandonware...
> unfortunately Nvidia is the only vendor that pays considerable number of people to develop the ecosystem. AMD basically says "get lost" by refusing to put more than a handful of people on the job of providing OpenCL libraries.
Vulkan itself is developed and supported well, and it already can be used for compute as far as I know. But apparently there are some features that come from the OpenCL world that need to be filled in. It wouldn't be AMD's exclusive effort. So hopefully things will start moving.
The language and basic platform is not a problem. OpenCL was and is OK. However, the libraries are far and between. CUDA offers cuBLAS, cuFFT, cuDNN, cuSolve, etc. For OpenCL, even the decent BLAS library (CLBlast) had to be written by a guy who did it for free, while AMD's clBLAS is more or less stalled (and I never managed to build it on Linux in the first place), and that's it...
The ability just to swap in the cuFFTW header for FFTW3's making calls execute on GPU (even though it doesn't give the best performance) is also nice for beginners.
Khronos is to blame by having OpenCL be a C only game, while CUDA was C, C++ and Fortran with PTX for anyone else that wanted to write a compiler fronted for CUDA.
It took them being beaten by NVidia to actually care to add SPIR and C++ support to OpenCL.
Even now, while CUDA brings C++ compiler out of the box with their SDK, for OpenCL one needs to go to Codeplay and download their ComputeCpp Community edition compiler for SYSCL support, that might or not, support a given card. Hardly any better.
Followed along until I had to compile a kernel, now I'm facing a java.lang.UnsatisfiedLinkError: Error while loading native library "JNvrtc-0.9.0-windows-x86_64" This seems to be a dependency of ClojureCUDA, but I don't see anything about it in their installation instructions. I have the CUDA Toolkit installed. Everything worked up to this point.
CUDA uses a single-source approach, meaning that the host (CPU) and device (GPU) code are in the same file. So it requires a special compiler (nvcc) that splits the original source files, compiles the host and device parts separately, and then merges the result back together.
This requires nvcc and the device compiler to have exact knowledge of how the host compiler compiles every single construct (thing e.g. about alignment and padding in complex structures), and they must at least be able to parse the syntax of the host include files (which e.g. fails if the include files have C++11 syntax, but the device compiler only knows how to parse C++98).
The cuda compiler itself (nvcc) is far behind the features of more recent compilers. For instance, c++11 is supported, but not the full standard. It will take a while before 14/17 are supported.
how does that stop it from using the latest version of clang++ or g++? they are backwards compatible with older C++ versions. The context is linux and a makefile failing with a message that your g++ or clang++ must be a version older than something
Unlike jcuda (which people typically recommend despite not being updated as often) we actually depend on this for the nd4j and deeplearning4j projects.
These cuda bindings are meant to be a 1 to 1 mapping to the cuda api as well. Hope this helps!
Okay, so it's easier then directly using the the CUDA, etc., C toolchains, perhaps, but why not compare to Python + Numba, which has been available with GPU support for quite a while, and likewise avoids direct exposure to the underlying C toolchains, provides interactive compilation, can be used with a nice REPL (or, Jupyter Notebook), etc.?
If you already know Clojure, this is probably the best chance to extend something you already love using. If you don't, you're probably better off learning either CUDA or Clojure rather than both at the same time. Debugging CUDA errors are already painful, I wouldn't add a new host language on top of that.
For context, I'm currently taking my school's GPGPU course. We've just started actually writing non-trivial code.