Skip to content

ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul#24753

Merged
ggerganov merged 3 commits into
ggml-org:masterfrom
shalinib-ibm:master
Jun 19, 2026
Merged

ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul#24753
ggerganov merged 3 commits into
ggml-org:masterfrom
shalinib-ibm:master

Conversation

@shalinib-ibm

Copy link
Copy Markdown
Contributor

Overview

This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass the reduced panel size through packing and kernel execution. This allows more workloads to use the MMA kernel and reduces fallback to mnpack.

Performance Impact:

~ 60% gain in PP speed with granite-3.38b-instruct Q8_0 and Q4_0 models tested with llama-bench -p 512 -n 1 on power10 ppc64le box.

Additional information

Requirements

This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass the reduced panel size through packing and kernel execution.  This allows more workloads to use the MMA kernel and reduces fallback to mnpack.
ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul
@shalinib-ibm shalinib-ibm requested a review from ggerganov as a code owner June 18, 2026 07:17
@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 18, 2026
@shalinib-ibm

Copy link
Copy Markdown
Contributor Author

@taronaeo @ggerganov can you please help review this PR ?

@ggerganov ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 18, 2026
Comment thread ggml/src/ggml-cpu/llamafile/sgemm.cpp Outdated
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
@ggerganov ggerganov merged commit 8141e73 into ggml-org:master Jun 19, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants