fix: seal MCP OAuth tokens with keyring-backed file#3185
Conversation
Replace the single keyring-bundle layout with a keyring-sealed encrypted file. The OS keyring now holds only a random AES-256 key; all tokens are AES-256-GCM encrypted into ~/.cagent/mcp-oauth-tokens.enc with atomic writes and 0600 permissions. This fixes the per-item size limits of keyring backends (Windows Credential Manager caps items at 2560 bytes, the Linux kernel keyring at a per-user quota) that a multi-token bundle could exceed, while keeping keyring access to a single item so macOS prompts only once. Includes best-effort migration from both legacy layouts (the single oauth:tokens bundle and the older per-resource items) and graceful fallbacks for an unavailable keyring or a corrupt/unreadable file.
Two issues found in review of the keyring-sealed token store: - Migration deleted the legacy keyring entries before the encrypted file was written, so a failed write would lose every token irrecoverably. Migration now persists first and only removes the legacy entries once the write succeeds; on failure the legacy entries are kept for retry and the in-memory cache still serves the current session. - persistLocked re-fetched the encryption key from the keyring on every write, defeating the once-per-process guarantee. The key is now cached in memory after the first fetch. Adds regression tests for both behaviors.
Address review findings in the keyring-sealed OAuth token store: - use the shared atomicfile writer so replacement works on Windows - add a cross-process lock and reload-before-write to avoid lost updates - make legacy bundle tokens win over older per-resource entries, while the encrypted file remains the newest source of truth during migration - avoid creating the keyring item for a simple missing-token lookup - refuse to overwrite when an existing token file is temporarily unreadable - use deterministic failure injection in migration tests Adds regression tests for the new safety properties.
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The AES-256-GCM implementation is well-structured: fresh random nonces are generated per encryption call, authenticated decryption is used correctly (GCM tag verification via cipher.AEAD.Open), atomic file writes are handled by natefinch/atomic which correctly handles existing-file replacement on Windows, cross-process file locking uses flock (Unix) / LockFileEx (Windows), and the migration logic preserves tokens with a persist-before-delete strategy. One minor hardening opportunity was found.
| if len(item.Data) != encryptionKeySize { | ||
| return nil, fmt.Errorf("stored encryption key has wrong size %d", len(item.Data)) | ||
| } | ||
| s.key = item.Data |
There was a problem hiding this comment.
[LOW] Encryption key aliased from keyring item.Data — no defensive copy
In encryptionKey(), when fetching an existing key from the keyring, s.key is assigned directly from item.Data:
s.key = item.DataThis copies only the slice header (pointer, length, cap), not the underlying bytes. While Go's GC ensures the backing array won't be collected as long as s.key is live, the alias means any keyring backend that reuses or mutates its internal buffer after Get() returns could silently corrupt the cached key — causing all subsequent encrypt/decrypt operations to fail or produce wrong ciphertext.
In contrast, the newly-generated key path (line 193) does not have this issue since key is a freshly-allocated slice that is never shared.
Suggestion: Use a defensive copy to be safe across all possible keyring backends:
s.key = bytes.Clone(item.Data)
Summary