ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul by shalinib-ibm · Pull Request #24753 · ggml-org/llama.cpp

shalinib-ibm · 2026-06-18T07:17:21Z

Overview

This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass the reduced panel size through packing and kernel execution. This allows more workloads to use the MMA kernel and reduces fallback to mnpack.

Performance Impact:

~ 60% gain in PP speed with granite-3.38b-instruct Q8_0 and Q4_0 models tested with llama-bench -p 512 -n 1 on power10 ppc64le box.

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass the reduced panel size through packing and kernel execution. This allows more workloads to use the MMA kernel and reduces fallback to mnpack.

ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul

shalinib-ibm · 2026-06-18T07:18:11Z

@taronaeo @ggerganov can you please help review this PR ?

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

shalinib-ibm added 2 commits June 18, 2026 12:23

Merge pull request #27 from shalinib-ibm/shalinib-ibm-patch-1

02e86cf

ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul

shalinib-ibm requested a review from ggerganov as a code owner June 18, 2026 07:17

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 18, 2026

ggerganov approved these changes Jun 18, 2026

View reviewed changes

ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 18, 2026

taronaeo reviewed Jun 18, 2026

View reviewed changes

Comment thread ggml/src/ggml-cpu/llamafile/sgemm.cpp Outdated

Apply suggestion from @taronaeo

2e8b42c

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

taronaeo approved these changes Jun 18, 2026

View reviewed changes

ggerganov merged commit 8141e73 into ggml-org:master Jun 19, 2026
26 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul#24753

ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul#24753
ggerganov merged 3 commits into
ggml-org:masterfrom
shalinib-ibm:master

shalinib-ibm commented Jun 18, 2026

Uh oh!

shalinib-ibm commented Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shalinib-ibm commented Jun 18, 2026

Overview

Additional information

Requirements

Uh oh!

shalinib-ibm commented Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants