Geo: Multi-arch containers not properly replicating non-primary architectures to secondary Geo nodes, UI shows replication successful
<!--- Please read this! Before opening a new issue, make sure to search for keywords in the issues filtered by the "regression" or "bug" label: - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=bug and verify the issue you're about to submit isn't a duplicate. ---> ### Summary <!-- Summarize the bug encountered concisely. --> Multi-architecture images show as replicated in UI, but non-primary architectures are not available from the secondary node when trying to stat or inspect images. ### Steps to reproduce <!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. --> This behavior was reported by a US federal customer in [federal ticket 1050](https://gitlab-federal-support.zendesk.com/agent/tickets/1050) (GitLab internal, US citizenship required), but I have been able to reproduce the described behavior. **Initial setup** 1. Have Geo instances with container registry replication enabled. I have two instances in GCP, customer has more 1. Build and push a multi-architecture image to the primary node (I used `docker buildx` to build a [BusyBox](https://github.com/docker-library/busybox) image for `amd64` and `arm64`) 1. Wait for sync and verification to complete 1. Compare GUI output between primary and secondary nodes. Both of my nodes report synchronized, and container size and hashes are shown to be the same between nodes. **Comparison and troubleshooting** 1. Using `skopeo`, inspect remote images, I observe that my primary node identifies both architectures in the container but the second does **not** identify any arch at all. Also note that hashes mismatch: ``` brad@DebianRulez:~$ skopeo inspect --raw docker://geo1.bradsevy.online:5050/root/busybox-multi { "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json", "schemaVersion": 2, "manifests": [ { "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "digest": "sha256:ac3408ba45f5038129cefd401d3828bca2a32e54dc0bf6ff44056936457bf1c5", "size": 740, "platform": { "architecture": "amd64", <----------------------------------------------------------- "os": "linux" } }, { "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "digest": "sha256:1e21fbd67772efeb971f0b97be99572219823216b6b1c47a1308fc27e5076335", "size": 740, "platform": { "architecture": "arm64", <----------------------------------------------------------- "os": "linux" } } ] } --- brad@DebianRulez:~$ skopeo inspect --raw docker://geo2.bradsevy.online:5050/root/busybox-multi { "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "schemaVersion": 2, "config": { "mediaType": "application/vnd.docker.container.image.v1+json", "digest": "sha256:27f909e5658cb519e5175bc681d5c605f01b613503ce8dcf3fe3c1847d37f8c7", "size": 844 }, "layers": [ { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "digest": "sha256:0bc3020d05f1e08b41f1c5d54650a157b1690cde7fedb1fafbc9cda70ee2ec5c", "size": 50435617 }, { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "digest": "sha256:f875f728594f35a040e0e4b122c67fe6b05592c71912a6ad4a3136a907fe3eaa", "size": 14124231 } ] } ``` 2. File sizes and checksums mismatch: Primary: ``` root@brad-geo1:~# du -sh /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi 148K /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi root@brad-geo1:~# find /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi -type f -exec md5sum {} \; | sort -k 2 | md5sum ca6f7edcd95ade00548dc258e2b40af1 - ``` Secondary: ``` root@brad-geo2:~# du -sh /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi 92K /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi root@brad-geo2:~# find /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi -type f -exec md5sum {} \; | sort -k 2 | md5sum e4d0a98fd71c3a6b7ec3f2b62e3bc64d - ``` 3. Specifying architecture to pull with `--platform=arm64`, then inspecting the pulled image with `docker image inspect <id>` results in successfully pulling the `arm64` image from the primary node, but still pulling `amd64` on the secondary node: Primary node: ``` brad@DebianRulez:~$ docker pull geo1.bradsevy.online:5050/root/busybox-multi --platform=arm64 Using default tag: latest latest: Pulling from root/busybox-multi 310b368da982: Pull complete dc96c5f90a6f: Pull complete Digest: sha256:eaf1fdf80669e7338ab1edfeabd8b96f2fac673eaa971f8480d4006e29ec7a72 Status: Downloaded newer image for geo1.bradsevy.online:5050/root/busybox-multi:latest geo1.bradsevy.online:5050/root/busybox-multi:latest brad@DebianRulez:~$ docker images -a REPOSITORY TAG IMAGE ID CREATED SIZE geo1.bradsevy.online:5050/root/busybox-multi latest 798012f55906 12 days ago 126MB brad@DebianRulez:~$ docker image inspect 798012f55906 [ { "Id": "sha256:798012f55906d247c79ea2e9acfbc6d53593b7751c5d851bf1f41eaff4237f52", "RepoTags": [ "geo1.bradsevy.online:5050/root/busybox-multi:latest" ], "RepoDigests": [ "geo1.bradsevy.online:5050/root/busybox-multi@sha256:eaf1fdf80669e7338ab1edfeabd8b96f2fac673eaa971f8480d4006e29ec7a72" ], "Parent": "", "Comment": "buildkit.dockerfile.v0", "Created": "2021-06-30T18:46:37.189300648Z", "Container": "", "ContainerConfig": { "Hostname": "", "Domainname": "", "User": "", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": null, "Cmd": null, "Image": "", "Volumes": null, "WorkingDir": "", "Entrypoint": null, "OnBuild": null, "Labels": null }, "DockerVersion": "", "Author": "", "Config": { "Hostname": "", "Domainname": "", "User": "", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" ], "Cmd": [ "bash" ], "Image": "", "Volumes": null, "WorkingDir": "", "Entrypoint": null, "OnBuild": null, "Labels": null }, "Architecture": "arm64", <------------------------------------------------------------------------------ "Os": "linux", "Size": 126480503, "VirtualSize": 126480503, "GraphDriver": { "Data": { "LowerDir": "/var/lib/docker/overlay2/bcfca43187e0079723d6fdc91b17d003bbd2789ebfc1838909a79f30b9aa99ef/diff", "MergedDir": "/var/lib/docker/overlay2/fe35f24b09d6f2af2fb39807d7f11531b24f5e445edb9452370a5fcfd32e58de/merged", "UpperDir": "/var/lib/docker/overlay2/fe35f24b09d6f2af2fb39807d7f11531b24f5e445edb9452370a5fcfd32e58de/diff", "WorkDir": "/var/lib/docker/overlay2/fe35f24b09d6f2af2fb39807d7f11531b24f5e445edb9452370a5fcfd32e58de/work" }, "Name": "overlay2" }, "RootFS": { "Type": "layers", "Layers": [ "sha256:bee1275ae7ac87065d84e2e06aec6254579ac19d9b84e325cbbe03d46e8730e7", "sha256:f48735d31fdcbfb2125502fd4530a17b53d373e61bb8683cb6be9a1c8e1edea3" ] }, "Metadata": { "LastTagTime": "0001-01-01T00:00:00Z" } } ] ``` Secondary node: ``` brad@DebianRulez:~$ docker pull geo2.bradsevy.online:5050/root/busybox-multi --platform=arm64 Using default tag: latest latest: Pulling from root/busybox-multi 0bc3020d05f1: Pull complete f875f728594f: Pull complete Digest: sha256:ac3408ba45f5038129cefd401d3828bca2a32e54dc0bf6ff44056936457bf1c5 Status: Downloaded newer image for geo2.bradsevy.online:5050/root/busybox-multi:latest geo2.bradsevy.online:5050/root/busybox-multi:latest brad@DebianRulez:~$ docker images -a REPOSITORY TAG IMAGE ID CREATED SIZE geo2.bradsevy.online:5050/root/busybox-multi latest 27f909e5658c 12 days ago 133MB brad@DebianRulez:~$ docker image inspect 27f909e5658c [ { "Id": "sha256:27f909e5658cb519e5175bc681d5c605f01b613503ce8dcf3fe3c1847d37f8c7", "RepoTags": [ "geo2.bradsevy.online:5050/root/busybox-multi:latest" ], "RepoDigests": [ "geo2.bradsevy.online:5050/root/busybox-multi@sha256:ac3408ba45f5038129cefd401d3828bca2a32e54dc0bf6ff44056936457bf1c5" ], "Parent": "", "Comment": "buildkit.dockerfile.v0", "Created": "2021-06-30T18:46:21.646255956Z", "Container": "", "ContainerConfig": { "Hostname": "", "Domainname": "", "User": "", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": null, "Cmd": null, "Image": "", "Volumes": null, "WorkingDir": "", "Entrypoint": null, "OnBuild": null, "Labels": null }, "DockerVersion": "", "Author": "", "Config": { "Hostname": "", "Domainname": "", "User": "", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" ], "Cmd": [ "bash" ], "Image": "", "Volumes": null, "WorkingDir": "", "Entrypoint": null, "OnBuild": null, "Labels": null }, "Architecture": "amd64" <---------------------------------------------------------------------------------- "Os": "linux", "Size": 132681348, "VirtualSize": 132681348, "GraphDriver": { "Data": { "LowerDir": "/var/lib/docker/overlay2/07930508f682b867663201ea759fc6e2d01ed9283ce0f07e3068397aff530388/diff", "MergedDir": "/var/lib/docker/overlay2/5b0d1d456fd34ecfbee0096491eb81c0f01f67f4e5564bf23e2b1a5847c036fa/merged", "UpperDir": "/var/lib/docker/overlay2/5b0d1d456fd34ecfbee0096491eb81c0f01f67f4e5564bf23e2b1a5847c036fa/diff", "WorkDir": "/var/lib/docker/overlay2/5b0d1d456fd34ecfbee0096491eb81c0f01f67f4e5564bf23e2b1a5847c036fa/work" }, "Name": "overlay2" }, "RootFS": { "Type": "layers", "Layers": [ "sha256:4e006334a6fdea37622f72b21eb75fe1484fc4f20ce8b8526187d6f7bd90a6fe", "sha256:51ea4e37f486d3064055a010939db3384b70e33240ff478cb09cf4d3858ca709" ] }, "Metadata": { "LastTagTime": "0001-01-01T00:00:00Z" } } ] ``` ### Internal discussions I engaged the Registry and Geo teams in their respective internal Slack channels. Messages available until approximately 10 October 2021. Relevant messages are copied into the internal ticket for posterity. Registry: https://gitlab.slack.com/archives/CRD4A8HG8/p1625686073105400 Geo: https://gitlab.slack.com/archives/CRD4A8HG8/p1625686073105400 ### Example Project <!-- If possible, please create an example project here on GitLab.com that exhibits the problematic behavior, and link to it here in the bug report. If you are using an older version of GitLab, this will also determine whether the bug is fixed in a more recent version. --> ### What is the current *bug* behavior? <!-- Describe what actually happens. --> Only `amd64` is made available on secondary node. ### What is the expected *correct* behavior? <!-- Describe what you should see instead. --> Secondary architectures (`arm64` in this case) should be available on all secondary nodes. ### Relevant logs and/or screenshots <!-- Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's tough to read otherwise. --> 404 errors every few seconds on from `gitlab-ctl tail registry` on primary node: ``` 2021-07-12_21:03:13.63310 time="2021-07-12T21:03:13Z" level=warning msg="httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events} encountered too many errors, backing off" 2021-07-12_21:03:14.66027 time="2021-07-12T21:03:14Z" level=error msg="retryingsink: error writing events: httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events}: response status 404 Not Found unaccepted, retrying" 2021-07-12_21:03:14.66032 time="2021-07-12T21:03:14Z" level=warning msg="httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events} encountered too many errors, backing off" 2021-07-12_21:03:15.68155 time="2021-07-12T21:03:15Z" level=error msg="retryingsink: error writing events: httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events}: response status 404 Not Found unaccepted, retrying" 2021-07-12_21:03:15.68158 time="2021-07-12T21:03:15Z" level=warning msg="httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events} encountered too many errors, backing off" 2021-07-12_21:03:16.70599 time="2021-07-12T21:03:16Z" level=error msg="retryingsink: error writing events: httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events}: response status 404 Not Found unaccepted, retrying" ``` ### Output of checks <!-- If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com --> #### Results of GitLab environment info <!-- Input any relevant GitLab environment information if needed. --> <details> <summary>Expand for output related to GitLab environment info</summary> <pre> (For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`) </pre> </details> #### Results of GitLab application Check <!-- Input any relevant GitLab application check information if needed. --> <details> <summary>Expand for output related to the GitLab application check</summary> <pre> (For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:check SANITIZE=true`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`) (we will only investigate if the tests are passing) </pre> </details> ### Possible fixes <!-- If you can, link to the line of code that might be responsible for the problem. -->
issue