Skip to content

#92688: fix(qwen): use DashScope native image format for Qwen vision models#92704

Closed
sheyanmin wants to merge 1 commit into
openclaw:mainfrom
sheyanmin:fix/issue-92688-qwen-vision-content-format
Closed

#92688: fix(qwen): use DashScope native image format for Qwen vision models#92704
sheyanmin wants to merge 1 commit into
openclaw:mainfrom
sheyanmin:fix/issue-92688-qwen-vision-content-format

Conversation

@sheyanmin

Copy link
Copy Markdown

Summary

Fix Qwen vision models returning 400 "Unexpected item type in content" on DashScope by converting image content parts from standard OpenAI format (type: image_url) to DashScope native format (type: image).

Root Cause

DashScope's OpenAI-compatible chat completions endpoint (/compatible-mode/v1/chat/completions) does not support the standard image_url content part type for Qwen vision models (qwen3.7-max, qwen3.7-plus, etc.). When the image tool sends multimodal requests, the image is formatted as {type: "image_url", image_url: {url: "data:..."}} which DashScope rejects with HTTP 400:

400 InternalError.Algo.InvalidParameter: The provided messages input is invalid. 
The error info is [Unexpected item type in content.]

The DashScope native multimodal API expects images as {type: "image", image: "data:..."} — a flat structure where the image field is a direct data URI string rather than a {url: ...} wrapper object. This fix detects DashScope endpoints (by provider name or base URL) and converts the image format accordingly, while preserving standard OpenAI format for all other providers.

Real behavior proof

behavior

Detect DashScope endpoints and convert image content parts from OpenAI image_url format to DashScope native image format.

environment

  • OS: Windows 10 Enterprise LTSC 2019
  • Runtime: Node.js v24.14.0
  • Setup: OpenClaw workspace with tsx for TypeScript execution

steps

  1. Import convertMessages from production code (src/llm/providers/openai-completions.js)
  2. Construct a Qwen/DashScope model config (provider: "qwen", baseUrl: dashscope.aliyuncs.com)
  3. Construct a standard OpenAI model config for comparison
  4. Build messages containing both text and image content parts
  5. Call convertMessages with each model and inspect the content format
  6. Verify DashScope models get type: "image" while OpenAI models keep type: "image_url"

observedResult

Reproduction script output (./node_modules/.bin/tsx scripts/repro-92688.ts):

======================================================================
Issue #92688 — DashScope Qwen Vision Image Format Fix
======================================================================

--- Test 1: DashScope (qwen) model ---
   Text part: What's in this image?
✅ Image format: type=image (DashScope native format)
   image field starts with: data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEA...

--- Test 2: Standard OpenAI model ---
   Text part: What's in this image?
✅ Image format: type=image_url (standard OpenAI format)
   image_url.url starts with: data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEA...

--- Test 3: qwen-dashscope provider ---
✅ Image format: type=image (DashScope native format)

--- Test 4: Custom provider with dashscope.aliyuncs.com baseUrl ---
✅ Image format: type=image (DashScope native format by URL detection)

======================================================================
All checks passed — DashScope/Qwen endpoints now receive correct image format.
======================================================================

Production change:

// Detection: provider includes "dashscope", provider is "qwen"/"qwen-dashscope",
// or baseUrl includes "dashscope.aliyuncs.com"
function isDashScopeEndpoint(model: Model<"openai-completions">): boolean {
  const provider = model.provider?.toLowerCase() ?? "";
  const baseUrl = model.baseUrl?.toLowerCase() ?? "";
  return (
    provider.includes("dashscope") ||
    provider === "qwen" ||
    provider === "qwen-dashscope" ||
    baseUrl.includes("dashscope.aliyuncs.com")
  );
}

// In convertMessages(), user message image formatting:
if (useDashScopeFormat) {
  return {
    type: "image",                              // ← DashScope native type
    image: `data:${item.mimeType};base64,${item.data}`,  // ← direct string
  } as unknown as ChatCompletionContentPart;
}
// Standard OpenAI format (unchanged for all other providers):
return {
  type: "image_url",
  image_url: {
    url: `data:${item.mimeType};base64,${item.data}`,
  },
} satisfies ChatCompletionContentPartImage;

Regression Test Plan

  • Run pnpm test -- --run src/llm/providers/openai-completions.test.ts — existing OpenAI completions streaming tests pass unchanged
  • Verify DashScope endpoint detection covers: provider "qwen", provider "qwen-dashscope", any provider with URL containing "dashscope.aliyuncs.com"
  • Verify standard OpenAI provider images remain as image_url format (no regression)
  • Change is minimal (1 file, 42 insertions, 9 deletions) and does not affect any non-DashScope code path

AI-assisted: built with Claude Code

DashScope's OpenAI-compatible endpoint rejects the standard
`image_url` content part type with 'Unexpected item type in
content' for Qwen vision models. Convert to DashScope native
format (`type: image` with direct data URI string) when the
provider or baseUrl indicates a DashScope endpoint.

Detection: provider includes 'dashscope', provider is 'qwen' or
'qwen-dashscope', or baseUrl includes 'dashscope.aliyuncs.com'.

Closes openclaw#92688
@openclaw-barnacle openclaw-barnacle Bot added size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 13, 2026
@clawsweeper

clawsweeper Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 15, 2026, 2:02 PM ET / 18:02 UTC.

Summary
This PR changes src/llm/providers/openai-completions.ts to detect Qwen/DashScope endpoints and serialize user and tool-result images as { type: "image", image: dataUri } instead of image_url.

PR surface: Source +33. Total +33 across 1 file.

Reproducibility: yes. for the PR regression: source and provider-contract inspection show the patch changes compatible-mode image requests away from documented image_url. The original user-facing DashScope 400 still needs live credentials for an end-to-end reproduction.

Review metrics: 1 noteworthy metric.

  • Provider Wire Format: 1 serializer branch added. The added branch changes the request payload sent to every detected DashScope/Qwen OpenAI-compatible image request.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🧂 unranked krab
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted live DashScope compatible-mode terminal output or logs showing the image request succeeds after the fix and the reported 400 is gone.
  • [P1] Keep compatible-mode images as image_url, or provide current provider docs/live proof that { type: "image" } is required despite the official compatible-mode examples.
  • Remove the unused DashScopeImageContentPart alias so typecheck and lint can pass.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The PR body shows local terminal output from a serialization script, but not a redacted live DashScope compatible-mode request proving the 400 is gone; after adding live output/logs and redacting secrets or private details, updating the PR body should trigger re-review, or a maintainer can comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] Merging this PR can make existing DashScope OpenAI-compatible image requests use an undocumented { type: "image" } payload even though the official compatible-mode examples use image_url.
  • [P1] The PR body proves only local serialization output, not a real DashScope request succeeding or the reported 400 disappearing after the patch.
  • [P1] The sibling PRs for the same issue are still open and unmerged, so this branch cannot be closed as safely superseded by a landed fix.

Maintainer options:

  1. Keep Compatible-Mode Serialization On image_url (recommended)
    Remove the DashScope image serializer branch and move the repair to the media-understanding Qwen prompt/model-selection path with focused regression coverage.
  2. Accept A Provider Contract Exception With Proof
    Maintainers could accept the serializer branch only if redacted live DashScope compatible-mode output proves image_url fails and { type: "image" } succeeds for the affected supported model path.
  3. Pause Until A Canonical Fix Lands
    If this branch is not going to be repaired, keep the linked issue and the stronger open fix path as the canonical place to resolve the Qwen/DashScope failure.

Next step before merge

  • [P1] The next action is contributor or maintainer follow-up because the branch needs either live provider proof for the contract exception or a different fix direction, and automation cannot supply the contributor's DashScope environment proof.

Security
Cleared: No concrete security or supply-chain issue found; the diff only changes in-process provider request serialization and does not touch secrets, dependencies, workflows, or executable artifacts.

Review findings

  • [P1] Keep DashScope compatible-mode images as image_url — src/llm/providers/openai-completions.ts:975-979
  • [P2] Remove the unused DashScope image type alias — src/llm/providers/openai-completions.ts:95
Review details

Best possible solution:

Keep DashScope compatible-mode image serialization on image_url and repair the Qwen image-tool failure through the media-understanding prompt/model-selection path with redacted live provider proof.

Do we have a high-confidence way to reproduce the issue?

Yes for the PR regression: source and provider-contract inspection show the patch changes compatible-mode image requests away from documented image_url. The original user-facing DashScope 400 still needs live credentials for an end-to-end reproduction.

Is this the best way to solve the issue?

No: switching the shared OpenAI-compatible serializer to { type: "image" } is not the best fix without live proof that the official contract is wrong. The safer path keeps image_url and fixes the confirmed Qwen image-tool routing/prompt shape.

Full review comments:

  • [P1] Keep DashScope compatible-mode images as image_url — src/llm/providers/openai-completions.ts:975-979
    This branch replaces the documented image_url shape with { type: "image", image: ... } for every detected DashScope/Qwen compatible-mode image request. Alibaba's OpenAI-compatible Vision docs say to pass images via image_url, and related live provider evidence reports the native image block fails for image-capable Qwen models, so this can break working compatible-mode vision calls. (alibabacloud.com)
    Confidence: 0.9
  • [P2] Remove the unused DashScope image type alias — src/llm/providers/openai-completions.ts:95
    Both prod typecheck and lint fail on the submitted merge commit because DashScopeImageContentPart is declared but never used, so the PR cannot pass required checks until that alias is removed or actually used.
    Confidence: 0.99

Overall correctness: patch is incorrect
Overall confidence: 0.9

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against a0b16f37e835.

Label changes

Label justifications:

  • P2: The PR addresses a provider-specific image-understanding failure with limited blast radius, but the current patch is not safe to merge.
  • merge-risk: 🚨 compatibility: The diff changes the documented image_url compatible-mode request shape for existing DashScope/Qwen image requests.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦪 silver shellfish and patch quality is 🧂 unranked krab.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body shows local terminal output from a serialization script, but not a redacted live DashScope compatible-mode request proving the 400 is gone; after adding live output/logs and redacting secrets or private details, updating the PR body should trigger re-review, or a maintainer can comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +33. Total +33 across 1 file.

View PR surface stats
Area Files Added Removed Net
Source 1 42 9 +33
Tests 0 0 0 0
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 1 42 9 +33

What I checked:

Likely related people:

  • steipete: Recent live history shows repeated OpenAI-compatible completions provider maintenance, and the Qwen provider/media-understanding surface was introduced under the Qwen provider commit. (role: recent area contributor and Qwen feature-history contributor; confidence: high; commits: 439a9e97fd61, d6dffd6ef81a, e3ac0f43df3e; files: src/llm/providers/openai-completions.ts, extensions/qwen/media-understanding-provider.ts, extensions/qwen/openclaw.plugin.json)
  • hxy91819: The recent image setup/request timeout work touching src/media-understanding/image.ts was approved and coauthored by this account in the commit metadata. (role: recent adjacent image-runtime reviewer; confidence: medium; commits: 5854e0c8f6b5, 001dee3fb088; files: src/media-understanding/image.ts, src/media-understanding/image.test.ts)
  • vincentkoc: Recent provider-attribution metadata work is a likely safer seam for endpoint-family decisions than hard-coding DashScope request shapes inside the shared serializer. (role: adjacent provider-capability contributor; confidence: medium; commits: 4fbc490fcaee, f1340be05150; files: src/agents/provider-attribution.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels Jun 13, 2026
@openclaw-clownfish

Copy link
Copy Markdown
Contributor

Thanks @sheyanmin for jumping on the Qwen/DashScope image failure in #92704.

I am closing this as superseded by #92770 because that PR keeps the fix on the narrower canonical path for #92688: placing the Qwen/DashScope image prompt in user content, with the focused media-understanding regression test and passing proof/checks. This PR's native-format serializer approach is still recorded as a source PR in the cluster, so Clownfish can preserve attribution and credit for the contributor context it added.

If this branch contains a distinct reproduction detail or provider behavior that #92770 does not cover, please reply here and we can reopen or split that follow-up back out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clownfish Tracked by Clownfish automation merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant