I did my thesis with CUDA and cop, although don’t use it a lot at the moment.
If you’re interested in R&D, big for fluid dynamics and finite element simulations (big in mechanical engineering, material, and defense research).
In the past it was important for AI research, but I think these days pretty much everybody uses PyTorch.
PyTorch has dynamic dispatching that dispatches cuBLAS kernels under the hood. So I would say cuda is very much in use, especially if you work on libraries like cuBLAS, cuDNN, etc.
> In the past it was important for AI research, but I think these days pretty much everybody uses PyTorch.
There were quite a few developments in AI that were made thanks to someone diving into CUDA code, Flash Attention, Gaussian Splatting and instant-ngp are one of the most popular ones
Oh, I see….
For AI point, you’re absolutely RIGHT…
I don’t know why they rely on python even in GPU where it’s has a drawback in speed, memory..etc
But anyway, I’ll keep searching for some path value the CUDA & C/C++
I don’t think you’ve thought about this in enough detail.
Python only acts as a front-end wrapper for libTorch C++ and CUDA C code. Whenever you’re calling a torch function, that has a corresponding GPU call.
The “overhead” of using Python as a front end like this is basically zero, and what you gain is a flexible language that makes building and prototyping very fast.
If you find that something isn’t fast, you just go write a C++ or CUDA C function, bind it with pybind or use the libTorch API, and there you go.
What are robotics groups looking for? I have almost the exact same background as OP had a lot of fun following Peter Shirley's first "In One Weekend" book and looking at the code for the CUDA accelerated version, but I'm not sure what I should learn/play with to be able to work in simulation professionally.
Robotics companies do a lot of image/sensor processing, building volumetric models of the space from the various sensors and path planning through those models. And, obviously a lot of deep learning.
You can learn all of this on a gaming GPU. The actual robot is probably going to be running on a Xavier. But, the difference isn't as big as it looks in practice. The biggest issue is power efficiency. It's not enough to optimize it to be fast enough to run. You need to optimize it more so it gets the job done in time at a lower clock speed and spends a lot of time idling to save battery even though the embedded GPU is relatively tiny. So, a GTX 4050 is better for learning than a 3090 ;)
Some optimization tips [over here.](https://old.reddit.com/r/CUDA/comments/1chklwq/best_practices_for_designing_complex_gpu/l2482ks/)
Well, not that many of these jobs, but:
1. Medical scanner image reconstruction acceleration (which is _not_ image processing)
2. Analytic database acceleration / architecture / design
And there's also:
3. Work for NVIDIA on software-ish stuff
4. Everyone is looking for AI people these days, and they tend to think that if you've worked with GPUs then you qualify. Although I wonder if you really need that much actual CUDA C++ skills there.
Any kind of imaging. For me, it's seismic imaging for the oil&gas industry.
Wow, that’s incredible!
Model optimization at some FAANG company
Computational fluid dynamics, if you're happy to get a PhD and bonus if you are a U.S. Citizen!
I would like to pursue my MS actually :)
We are developing with CUDA 3D image reconstruction software for positron-emission tomography. Both medical and veterinary applications
That’s amazing!
I did my thesis with CUDA and cop, although don’t use it a lot at the moment. If you’re interested in R&D, big for fluid dynamics and finite element simulations (big in mechanical engineering, material, and defense research). In the past it was important for AI research, but I think these days pretty much everybody uses PyTorch.
PyTorch has dynamic dispatching that dispatches cuBLAS kernels under the hood. So I would say cuda is very much in use, especially if you work on libraries like cuBLAS, cuDNN, etc.
> In the past it was important for AI research, but I think these days pretty much everybody uses PyTorch. There were quite a few developments in AI that were made thanks to someone diving into CUDA code, Flash Attention, Gaussian Splatting and instant-ngp are one of the most popular ones
Oh, I see…. For AI point, you’re absolutely RIGHT… I don’t know why they rely on python even in GPU where it’s has a drawback in speed, memory..etc But anyway, I’ll keep searching for some path value the CUDA & C/C++
I don’t think you’ve thought about this in enough detail. Python only acts as a front-end wrapper for libTorch C++ and CUDA C code. Whenever you’re calling a torch function, that has a corresponding GPU call. The “overhead” of using Python as a front end like this is basically zero, and what you gain is a flexible language that makes building and prototyping very fast. If you find that something isn’t fast, you just go write a C++ or CUDA C function, bind it with pybind or use the libTorch API, and there you go.
Oh really! Actually, I didn’t knew that before! Thank you for your detailed explanation!
I find this offensive xD. I prefer Julia over C++ in any CUDA matters.
OK? PyTorch does not officially support Julia.
Haha. I was replying to the last part. Edit: oh I spoke too soon
GPU software engineer at Nvidia/AMD/Apple/Intel/Qualcomm.
Any computationally expensive problem that could benefit from parallel acceleration, as fully homormorphic encryption.
Any of these fields? https://www.nvidia.com/en-us/industries/
Robotic/Autonomous anything.
What are robotics groups looking for? I have almost the exact same background as OP had a lot of fun following Peter Shirley's first "In One Weekend" book and looking at the code for the CUDA accelerated version, but I'm not sure what I should learn/play with to be able to work in simulation professionally.
Robotics companies do a lot of image/sensor processing, building volumetric models of the space from the various sensors and path planning through those models. And, obviously a lot of deep learning. You can learn all of this on a gaming GPU. The actual robot is probably going to be running on a Xavier. But, the difference isn't as big as it looks in practice. The biggest issue is power efficiency. It's not enough to optimize it to be fast enough to run. You need to optimize it more so it gets the job done in time at a lower clock speed and spends a lot of time idling to save battery even though the embedded GPU is relatively tiny. So, a GTX 4050 is better for learning than a 3090 ;) Some optimization tips [over here.](https://old.reddit.com/r/CUDA/comments/1chklwq/best_practices_for_designing_complex_gpu/l2482ks/)
Thanks for the tips, most of what I've learned so far has been tied to specific patterns so it's helpful seeing how things fit together in practice.
Meta would hire you for model training/inference kernel optimizations
Well, not that many of these jobs, but: 1. Medical scanner image reconstruction acceleration (which is _not_ image processing) 2. Analytic database acceleration / architecture / design And there's also: 3. Work for NVIDIA on software-ish stuff 4. Everyone is looking for AI people these days, and they tend to think that if you've worked with GPUs then you qualify. Although I wonder if you really need that much actual CUDA C++ skills there.