fix(runtime): persist sub-session transcript on error path#3151
Merged
Conversation
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
This fix correctly addresses the sub-session transcript loss on the error path in runForwarding. The implementation looks sound:
- Error capture pattern: Captures the first
ErrorEventinto a local variable while continuing to drain the channel — correct approach for keeping the TUIstreamDepthcounter balanced. - Unconditional persistence:
parent.AddSubSession(s)andSubSessionCompletedare now always emitted regardless of error state — matches the persistence contract documented inpersistence_observer.go. - ToolsApproved guard: Correctly scoped to the success path only, preventing accidental approval escalation from a failed sub-session scope.
- Test coverage:
TestTransferTaskPersistsSubSessionOnErrorwithmockProviderWithErrorvalidates the primary fix path well.
No bugs found in the changed code.
When a sub-session emitted an ErrorEvent (model stream failure, loop detector, hook-driven termination, tool-load error), runForwarding early-returned at the first error, skipping both `parent.AddSubSession` and the `SubSessionCompletedEvent` emission. The persistence pipeline relies on that event to atomically write the sub-session row and all its messages to the store — sub-session events are deliberately ignored during normal streaming because they're "absorbed by SubSessionCompleted" per the comment on PersistenceObserver.OnEvent. The result: any sub-agent that hit an error mid-stream had its entire transcript silently dropped. The parent agent's tool result for transfer_task became "Error calling tool: <opaque message>" and the work the sub-agent actually performed (often dozens of tool calls and many assistant turns) was invisible to anyone walking session_items. The fix: - Capture the first ErrorEvent into a local instead of early-returning. - Keep draining the channel so the TUI's stream-depth counter stays balanced and any trailing events (notifications, hook output) reach the parent's event stream. - Always run the persistence emission, then return the captured error. Test: TestTransferTaskPersistsSubSessionOnError uses mockProviderWithError to make a sub-agent fail to start its model stream. It asserts that the parent session's Messages list contains the sub-session item AND that the events channel saw a SubSessionCompletedEvent — both of which fail on the pre-fix code (verified by stashing the fix and re-running the test).
18e3c3b to
86ce1ed
Compare
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The fix correctly addresses the transcript-loss bug in runForwarding. The approach — capture the first ErrorEvent into a local variable, continue draining the child event channel, then unconditionally call parent.AddSubSession and emit SubSessionCompletedEvent before returning — is sound and matches the documented contract in persistence_observer.go.
Changes reviewed:
pkg/runtime/agent_delegation.go— error-capture loop, unconditional persistence,ToolsApprovedguardpkg/runtime/runtime_test.go—TestTransferTaskPersistsSubSessionOnError
Key observations:
- The drain loop correctly forwards every event (including the
ErrorEvent) to the parent sink, keeping the TUI'sstreamDepthcounter balanced. - Guarding
parent.ToolsApproved = s.ToolsApprovedto the success path only is the right semantic: a failed sub-session should not silently escalate the parent's tool-approval gate. - The test's
close(evts)afterhandleTaskTransferreturns is safe becausechannelSink.Emitrecovers from send-on-closed-channel panics, andhandleTaskTransferis synchronous — all writes complete before it returns. - Only capturing the first
ErrorEventis intentional and correct for control flow.
No bugs were found in the changed code.
dgageot
approved these changes
Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a sub-agent's run loop emits an
ErrorEvent(model stream failure, loop detector trip, hook-driven termination, tool-load error),runForwardingearly-returned at the first error event. This skipped bothparent.AddSubSessionand theSubSessionCompletedEventemission.The persistence pipeline relies on
SubSessionCompletedEventto write the sub-session to the store. This contract is documented explicitly inpersistence_observer.go:68–72:The result: any sub-agent that hit an error mid-stream had its entire transcript silently dropped. The parent agent received
"Error calling tool: <message>"with no record of what the sub-agent actually did — invisible in session_items, invisible in the TUI.Fix
ErrorEventinto a local variable instead of early-returning.streamDepthcounter stays balanced and any trailing events (notifications, hook output) reach the parent's event stream.SubSessionCompletedEvent(and callparent.AddSubSession) before returning, regardless of whether the sub-session errored.parent.ToolsApprovedpropagation to the success path only — a failed sub-session must not silently escalate the parent's tool-approval gate.Test plan
TestTransferTaskPersistsSubSessionOnErrorusesmockProviderWithErrorto make a sub-agent fail to start its model stream. It asserts:parent.Messagescontains the sub-session item (fails on pre-fix code)ErrorEventwas forwarded to the parent sinkSubSessionCompletedEventfired (fails on pre-fix code)Verify by stashing the fix and re-running the test — it fails without the change.
Related
A companion fix for
runCollecting(the background-agent path) is in a separate PR — it has the same structural bug but theSubSessionCompletedEventemission requires a larger interface change.