What happened?
PromptCachingCache's extract_cacheable_prefix function may return an empty prefix when message.content is a string:
|
for msg_idx, message in enumerate(messages): |
|
content = message.get("content") |
|
if not isinstance(content, list): |
|
continue |
This appears to break API requests with bodies that look like the following, which in my testing with LiteLLM connected to AWS Bedrock allows for cache creation/reads:
{
"model": model_id,
"stream": False,
"max_tokens": 1024,
"messages": [
{"role": "system", "content": "You are an LLM named Prompt Cache Helper"},
{
"role": "user",
"content": large_message,
"cache_control": {
"type": "ephemeral",
"ttl": "5m"
}
},
],
}
But this approach works:
{
"model": model_id,
"stream": False,
"max_tokens": 1024,
"messages": [
{
"role": "system",
"content": "You are an Prompt Cache Helper"
},
{
"role": "user",
"content": [
{
"type": "text",
"text": large_message,
"cache_control": {
"type": "ephemeral",
"ttl": "5m" # or 5m
}
},
]
},
],
}
Note that these API specs for Claude and OpenAI don't say that cache_control can be a sibling key of content, so not sure if this is actually a bug, or if LiteLLM or Bedrock is just more flexible and allows for cache_control to be a sibling key...
Steps to Reproduce
- Config to use the prompt caching precheck:
router_settings:
enable_pre_call_checks: true
optional_pre_call_checks: ["prompt_caching"]
- Send an API request for caching with the
cache_control as a sibling of a content that is a string:
{
"model": model_id,
"stream": False,
"max_tokens": 1024,
"messages": [
{"role": "system", "content": "You are an LLM named Prompt Cache Helper"},
{
"role": "user",
"content": large_message,
"cache_control": {
"type": "ephemeral",
"ttl": "5m"
}
},
],
}
- Send a few of those API requests to see cache write and reads. For example, in the response you might see something like:
"usage": {
"completion_tokens": 1024,
"prompt_tokens": 5425,
"total_tokens": 6449,
"prompt_tokens_details": {
"cached_tokens": 5425,
"cache_creation_tokens": 0
},
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 5425
}
- If you debug into the code, you'll notice that the cache key prefix is None, which is incorrect because the usage results in step 3 above showed that the cache is being used:
|
# Generate cache key using cacheable prefix |
|
cache_key = PromptCachingCache.get_prompt_caching_cache_key(messages, tools) |
|
if cache_key is None: |
|
return None |
Relevant log output
What part of LiteLLM is this about?
Proxy
What LiteLLM version are you on ?
v1.80.11-stable
Twitter / LinkedIn details
No response
What happened?
PromptCachingCache'sextract_cacheable_prefixfunction may return an empty prefix whenmessage.contentis a string:litellm/litellm/router_utils/prompt_caching_cache.py
Lines 77 to 80 in b86aae0
This appears to break API requests with bodies that look like the following, which in my testing with LiteLLM connected to AWS Bedrock allows for cache creation/reads:
{ "model": model_id, "stream": False, "max_tokens": 1024, "messages": [ {"role": "system", "content": "You are an LLM named Prompt Cache Helper"}, { "role": "user", "content": large_message, "cache_control": { "type": "ephemeral", "ttl": "5m" } }, ], }But this approach works:
{ "model": model_id, "stream": False, "max_tokens": 1024, "messages": [ { "role": "system", "content": "You are an Prompt Cache Helper" }, { "role": "user", "content": [ { "type": "text", "text": large_message, "cache_control": { "type": "ephemeral", "ttl": "5m" # or 5m } }, ] }, ], }Note that these API specs for Claude and OpenAI don't say that
cache_controlcan be a sibling key ofcontent, so not sure if this is actually a bug, or if LiteLLM or Bedrock is just more flexible and allows forcache_controlto be a sibling key...Steps to Reproduce
cache_controlas a sibling of acontentthat is a string:{ "model": model_id, "stream": False, "max_tokens": 1024, "messages": [ {"role": "system", "content": "You are an LLM named Prompt Cache Helper"}, { "role": "user", "content": large_message, "cache_control": { "type": "ephemeral", "ttl": "5m" } }, ], }litellm/litellm/router_utils/prompt_caching_cache.py
Lines 214 to 217 in b86aae0
Relevant log output
What part of LiteLLM is this about?
Proxy
What LiteLLM version are you on ?
v1.80.11-stable
Twitter / LinkedIn details
No response