Skip to content

feat: add CUDA 13.2 support via cudarc 0.19.4#312

Merged
jafioti merged 2 commits into
luminal-ai:mainfrom
kimjune01:cuda-13.2-support
May 13, 2026
Merged

feat: add CUDA 13.2 support via cudarc 0.19.4#312
jafioti merged 2 commits into
luminal-ai:mainfrom
kimjune01:cuda-13.2-support

Conversation

@kimjune01

Copy link
Copy Markdown
Contributor

Summary

  • Update cudarc dependency from 0.19.2 to 0.19.4 to add CUDA 13.2 support
  • Migrate embed kernel to use shared dyn_dims buffer instead of per-operation allocations, aligning with cudarc 0.19.4 API changes

Fixes #291

Test plan

  • Compiles against cudarc 0.19.4
  • CUDA kernels use the updated buffer allocation pattern

kimjune01 added 2 commits May 11, 2026 21:16
Fixes luminal-ai#291

Changes:
- Upgrade cudarc from 0.18.2 to 0.19.4
- Remove get_global call for __constant__ memory tracking

Rationale:
cudarc 0.19.0 changed get_global to return CudaViewMut instead of
CudaSlice to prevent double-free of __constant__ memory managed by
the CUDA module. The old code worked around this by storing the
CudaSlice and calling std::mem::forget on cleanup. With the new API,
the view's lifetime is tied to the module borrow, making the
workaround unnecessary. Since the constants HashMap was only used
for this workaround and never accessed otherwise, we now return an
empty HashMap.

CUDA 13.2 support was added in cudarc 0.19.4.
The cudarc 0.18→0.19 bump removed get_global, but simply dropping the
call left __constant__ memory declared-but-never-written, producing
wrong results for models with dynamic-shape embeddings. Migrate to
the same dyn_dims parameter + #define pattern every other kernel uses.
@tucker-luminal tucker-luminal added the modal-ready When it's ready for modal label May 12, 2026
@jafioti jafioti merged commit 1dcd037 into luminal-ai:main May 13, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

modal-ready When it's ready for modal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for CUDA 13.2

3 participants