- [2026-06-10] DefTruth, Butterfingrz (2026). FFPA: Efficient Flash Prefill Attention for Large Head Dimensions via Split-D. Zenodo, 2026.
🎉🎉🎉
xlite-dev
Pinned Loading
Repositories
- LeetCUDA Public
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
xlite-dev/LeetCUDA’s past year of commit activity - sglang Public Forked from sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
xlite-dev/sglang’s past year of commit activity - GCMP Public Forked from VicBilibily/GCMP
通过集成国内主流原生大模型提供商,为开发者提供更加丰富、更适合本土需求的 AI 编程助手选择。 目前已内置支持 智谱AI、MiniMax、MoonshotAI、DeepSeek、阿里云百炼、快手万擎、火山方舟、腾讯云、Xiaomi MiMo 等原生大模型提供商。 此外,扩展插件已适配支持 OpenAI 与 Anthropic 的 API 接口兼容模型,支持自定义接入任何提供兼容接口的第三方云服务模型。
xlite-dev/GCMP’s past year of commit activity - cutlass Public Forked from NVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
xlite-dev/cutlass’s past year of commit activity - flash-attention Public Forked from Dao-AILab/flash-attention
Fast and memory-efficient exact attention
xlite-dev/flash-attention’s past year of commit activity - diffusers Public Forked from huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
xlite-dev/diffusers’s past year of commit activity - vllm Public Forked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
xlite-dev/vllm’s past year of commit activity - Awesome-DiT-Inference Public
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
xlite-dev/Awesome-DiT-Inference’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…
