Fix extract_cacheable_prefix to handle string content with message-level cache_control#19266
Merged
Sameerlite merged 1 commit intoJan 20, 2026
Conversation
…vel cache_control (fixes BerriAI#19228)
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
931998f
into
BerriAI:litellm_staging_01_20_2026
4 of 8 checks passed
fzowl
pushed a commit
to fzowl/litellm
that referenced
this pull request
Jun 24, 2026
…-string-content Fix extract_cacheable_prefix to handle string content with message-level cache_control
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Relevant issues
Fixes #19228
Summary
Fixed
extract_cacheable_prefixinPromptCachingCacheto correctly handle messages wherecache_controlis a sibling key of stringcontent.Problem
When a message has string content with a sibling
cache_controlkey like:json { "role": "user", "content": "large_message", "cache_control": {"type": "ephemeral", "ttl": "5m"} }The
extract_cacheable_prefixfunction would return an empty prefix because it only looked forcache_controlinside content blocks (whencontentis a list), not at the message level.This is a valid message format per LiteLLM's
ChatCompletionUserMessagetype definition (lines 692-693 intypes/llms/openai.py).Solution
Added a check for
cache_controlat the message level before checking within content blocks. When a message has message-levelcache_controlwithtype="ephemeral", the entire message is now correctly included in the cacheable prefix.Changes
litellm/router_utils/prompt_caching_cache.py:cache_controlinextract_cacheable_prefixlast_cacheable_content_idx = Noneto indicate entire message is cacheabletests/router_unit_tests/test_router_prompt_caching.py:Pre-Submission checklist
tests/litellm/directorymake test-unit(unable to run locally due to dependencies)Type
Bug Fix