Skip to content

server : disable context shift by default#15416

Merged
ggerganov merged 2 commits into
masterfrom
gg/server-disable-context-shift-default
Aug 19, 2025
Merged

server : disable context shift by default#15416
ggerganov merged 2 commits into
masterfrom
gg/server-disable-context-shift-default

Conversation

@ggerganov

Copy link
Copy Markdown
Member

Context shift was a useful feature in the past with pre-trained models and the raw /completions API. But today, it is causing a lot of confusion, so it is better to disable it by default. Can be re-enabled with --context-shift CLI arg.

@ggerganov ggerganov requested a review from ngxson as a code owner August 19, 2025 07:11
@github-actions github-actions Bot added examples python python script changes server labels Aug 19, 2025
@GuillaumeBruand

Copy link
Copy Markdown

@ggerganov I'm looking for ressources about the behaviour when context overflows. I was planning to conduct experiments using this --context-shift along with --keep N option (still not sure if this one is relevant) and --ctx-size smaller than training context.

What should I get from this change ? Is there a link with attention sink recently supported in llama.cpp ? Is this --context-shift option unrelevant for instruct fine-tuned model ?

@ngxson

ngxson commented Aug 19, 2025

Copy link
Copy Markdown
Collaborator

What should I get from this change ?

This only changes the default behavior, instead of having context shift on by default, it's now off by default.

You can manually enable it.

Comment on lines +28 to +30
server.enable_ctx_shift = True
server.start()
server.enable_ctx_shift = False

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngxson I noticed that the server parameters are stateful - i.e. if we change a parameter in one test, it will remain changed for the rest of the tests. This is the reason I do it like this here.

Is there a better way to set the parameter just for the scope of the current test?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be possible that the scope=module is the problem. Could you try removing it? (While keeping auto_use)

I was a bit confused about the notion of scope in pytest

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this seems to work.

@ggerganov ggerganov merged commit d2fcd91 into master Aug 19, 2025
50 checks passed
@ggerganov ggerganov deleted the gg/server-disable-context-shift-default branch August 19, 2025 13:46
@ggerganov

Copy link
Copy Markdown
Member Author

@GuillaumeBruand The context shift is difficult to handle with formatted endpoints such as /chat/completions because it can destroy the structure of the chat template, degrading the quality. So strongly recommend against using it in such cases.

@GuillaumeBruand

Copy link
Copy Markdown

Thanks for the insight, I'll go on with this PR and let it disabled for my experiments.

@DamonFool

Copy link
Copy Markdown
Contributor

Hi @ggerganov , the help msg about --context-shift seems incorrect?
Please see #15448 .
Thanks.

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
phibya pushed a commit to ziee-ai/llama.cpp that referenced this pull request May 29, 2026
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
fewtarius pushed a commit to fewtarius/CachyLLama that referenced this pull request May 30, 2026
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants