github: use libgit2 transport for ref resolution#15470
Conversation
42e0d86 to
2fa2411
Compare
Upstreamed as NixOS/nix#15470. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0fd340e to
3492d21
Compare
3492d21 to
d61b4f2
Compare
|
I don't see any issues with this, I could be missing something. |
|
@xokdvium let's merge this one as well? Looking forward not to hammer github http api |
1ae0dfa to
c4e92a7
Compare
c4e92a7 to
6919a7d
Compare
Use libgit2's high level remote API (git_remote_create_detached, git_remote_connect, git_remote_ls) instead of manually downloading and parsing the smart HTTP pkt-line format. This delegates protocol handling to libgit2 while keeping auth via custom HTTP headers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6919a7d to
250b1ce
Compare
|
Could we merge this one? |
|
@domenkozar, could you address the outstanding caching question? That was the only concern I noticed. |
Restore the TTL caching that downloadFile provided before switching to the libgit2 transport. Mirrors hgRefToRev in mercurial.cc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
See b67be79 |
|
Thanks, I can go fix the nitpicks myself. |
|
@xokdvium fixed some edge cases when fetching refs, good to go. |
0625784 to
bf3dda7
Compare
|
@xokdvium ping |
|
I pushed a fix we've caught in cachix/devenv#2842 |
resolveRemoteRef() resolves a github ref to a rev with a libgit2 smart transport ls-remote. libgit2 honours the user's url.<base>.insteadOf git config, so a "https://github.com -> ssh://git@github.com" rewrite turns the connection into ssh. The credentials callback only handled https token auth and returned GIT_PASSTHROUGH for everything else, so libgit2 gave up with "authentication required but no callback set". Provide ssh credentials from the user's ssh-agent (and the username from the url), the same way git/ssh do, so ref resolution keeps working over ssh without falling back to the GitHub REST API. Fixes cachix/devenv#2842. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4906790 to
654cd6a
Compare
|
Is github going to hate us for doing this? For nixpkgs this has to list a whopping And it takes a whopping 6-10s to return this result - so something is doing a non-trivial amount of work, and it doesn't seem like it's nix, because Is that reproducible on your end? |
|
Tbh I'm a bit wary of how much better this is. At the very least it seems to take significantly more waiting for nix to get the resolved ref. On the other hand it also seems like a loophole, rather than an actually cheaper operation (though I really don't have any visibility into the internals). The only thing I know is (@emilazy mentioned this to me at some point) that Github really doesn't want us to be hammering the git api (for cloning almost certainly, I don't know about ref resolution). |
|
Insofar that this is basically a loophole, I think we should just go ask a GitHub person first. I rather get karma for being nice, than get anti-karma for exploiting something they forgot got rate-limit. |
|
@JamieMagee, do you have more insights into the overheads of doing this? I'm not crazy and this is much more expensive than what we are currently doing with |
|
@xokdvium thanks for the ping. This isn't my area of expertise, but I can make sure it gets raised with the right team. |
|
I raised this with the team that runs GitHub's Git infrastructure. They'd rather nix didn't switch to The current So their recommended fix is use a token (60 to 5k/hr cap) and keep the caching this PR adds, which is worth landing on its own. There's another trick too, which Homebrew leans on: store the ETag from each tl;dr: I'd hold off on the transport swap, split out the cache and SSH-auth fix, push token auth and ETag conditional requests as the near-term mitigation. I'm happy to stay the contact point on GH. |
Current approach uses basically all of that AFAIK (ETag caching + we do have a custom user agent: So from what I understand is that there's not much that we can improve in addition to what's already implemented (other than asking users to do authenticated requests - that's already the case). |
|
So based off of @JamieMagee's helpful insight, it doesn't seem like there's much that can be done about unauthenticated requests in a way that isn't going to make the situation worse. We do in fact use caching in |
|
What we can improve is making forge login easier. Upstream version of: https://github.com/numtide/nix-auth |
Resolve branch/tag names to commit SHAs via the git HTTP protocol (/info/refs endpoint) instead of the GitHub REST API. This is the same approach already used by SourceHut and avoids API rate limits.