fix(cli): wait for all image pulls before starting containers#5681
Conversation
`supabase start` ran two uncoordinated image-pull paths: a best-effort concurrent compose pre-pull (PullOptions.IgnoreFailures) and a lazy per-container pull inside DockerStart. Any image the pre-pull failed to cache was pulled later, during the "Starting..." phase, so containers started before pulls had finished. Add ensureImagesCached: a completeness pass that resolves every project image through the same multi-registry fallback DockerStart uses, before any container starts. The compose pre-pull is kept as the fast, best- effort progress UI, but is no longer relied on for completeness. Surface the Docker install hint from DockerResolveImageIfNotCached so a "Docker not running" failure during the new pre-start resolve keeps the helpful suggestion it previously only got from DockerStart. Also add regression guards to the TypeScript port asserting that start aborts (rather than deferring the pull) when image preparation fails. Fixes #5068
4270a1f to
d6d0e4e
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Supabase CLI previewnpx --yes https://pkg.pr.new/supabase/cli/supabase@a244947c339f4280c6e1d08a730e60afd9f4ef33Preview package for commit |
jgoux
left a comment
There was a problem hiding this comment.
Review summary
Well-constructed, correct fix. The diagnosis is accurate and the fix reuses existing primitives (WaitAll, DockerResolveImageIfNotCached, GetServices) rather than reinventing them. I verified the load-bearing assumption: GetRegistryImageUrls only uses the last path segment to rebuild the candidate set, so feeding it an already-normalized service.Image is idempotent and produces exactly the set DockerStart later resolves against. Coverage also matches — project.Services is the same notExcluded-filtered set that gets started. Test coverage (Go gock.IsDone() invariant check + TS regression guards) is strong.
One real-but-benign concern (CmdSuggestion concurrent write) and one minor UX note inline. Both non-blocking; approving in spirit pending a conscious decision on the first.
… data race ensureImagesCached resolves images via WaitAll (one goroutine per image), so writing the package-global CmdSuggestion from inside DockerResolveImageIfNotCached raced under `go test -race` when the daemon was down. Extract SuggestDockerInstallIfConnectionFailed and set the hint once, sequentially, in the caller after errors.Join — same user-facing hint, no race. DockerStart now routes through the same helper. review: PR #5681
What changed
supabase startcould start containers before their Docker images had finished downloading.The command ran two uncoordinated image-acquisition paths:
pullImagesUsingCompose) using docker-compose'sPullwithPullOptions{IgnoreFailures: true}. It only targets the primary registry and, by design, silently swallows per-image pull failures (theIgnoreFailuresflag is the hook that lets the registry fallback recover).utils.DockerStart→DockerResolveImageIfNotCached(multi-registry fallback: ECR → GHCR → Docker Hub).So any image the concurrent pre-pull failed to cache — a transient registry/network/rate-limit hiccup, common on a fresh machine pulling 10+ images at once — was pulled later, during the
Starting database… / Starting containers…phase. That is the "start doesn't wait for pulls" behaviour from the issue. The pre-pull was added in #4394, matching the reporter's "last few versions" regression window.The fix
Add
ensureImagesCached, a completeness pass that runs immediately after the best-effort pre-pull and before any container starts. It resolves every project image through the same multi-registry fallback resolverDockerStartalready uses (DockerResolveImageIfNotCached), fanned out concurrently via the existingutils.WaitAllprimitive.After it returns, every required image is guaranteed present in the local cache, so the per-container
DockerStartcalls become pure cache hits and never pull mid-start. On the happy path it is just N cheap image inspects; an image that genuinely cannot be pulled from any registry now fails the start cleanly before any container is created, instead of limping into a half-pulled start. The compose pre-pull (and itsIgnoreFailures) is kept as the fast concurrent progress UI — it is simply no longer relied on for completeness.TypeScript port
The native-TS port already pulls in a preparation phase that is awaited before startup and fails hard on pull errors, so it does not have this bug. This PR adds regression guards locking that contract in:
Stack.unit.test.ts:stack.start()aborts and starts zero containers when a docker pull fails.prefetch.unit.test.ts: preparation fails withDockerPullErrorwhen the whole registry fallback chain fails.Fixes #5068