perf: Change release profile to use codegen-units=1 by davidlattimore · Pull Request #1500 · wild-linker/wild

davidlattimore · 2026-01-31T00:52:43Z

This gives more stable performance. i.e. it reduces the extent to which
small changes in code affect the performance of unrelated bits of code.

This also makes the release be more or less what we release, which is
useful if people `cargo install wild-linker.

Faster (to build) optimised builds are now available with the opt
profile.

davidlattimore · 2026-01-31T00:55:06Z

I don't see large differences in either runtime performance or in build times with or without thin LTO. So this is more about consistency than anything else. Open to any thoughts people have as to whether this is a good idea or not.

lapla-cogito

How about also setting codegen-units to 1 (or lower than the default of 16)?

davidlattimore · 2026-01-31T03:34:44Z

Based on some preliminary benchmarks, setting codegen-units=1 does have some benefit. I probably need to do some more benchmarking though. The setting does really hurt build times though. A warm release build goes from 16 seconds to 54 seconds. So if we decide to set that for our release builds, then I suspect we'll want a separate profile that's optimised, but not quite as much.

Another possibility is to try to identify which bits are making a difference. e.g. do some perf runs then use perf-diff to try to identify which functions benefited the most from the change. If we can identify that, then we might be able to get similar speedups with a sprinkling of inline annotations, although I guess those are slightly harder with third-party crates.

davidlattimore · 2026-01-31T07:46:58Z

A second motivation for this change is, or at least was, that a lot of my previous benchmarking has been done by comparing release builds, but I had been thinking that I should really compare dist builds, since that's what we distribute. By making them basically the same, I figured I'd be less likely to accidentally benchmark a release without thin-lto.

However, today while I've been running various benchmarks of my linker-plugin PR, I've noticed some odd results. The oddest was that the linker-plugin PR with the feature disabled at build time was showing a 9% slowdown on the bevy-dylib benchmark relative to a build without the PR. However, with a slightly different build configuration (strip disabled), the same benchmark showed a 6% speedup. All these benchmarks were with thin-lto enabled. My theory is that thin-lto is pretty inconsistent with which functions it inlines and which it doesn't. So changes in one part of the codebase can have significant performance effects in unrelated parts of the codebase.

This is not a great property and makes me wonder if we should actually be doing the opposite of this change and turning off thin-lto for the dist profile. Basically I'm wondering if the pretty small performance boost that we maybe get from thin-lto is worth it given how unreliable it makes benchmarking.

To that end, I've just done a bunch of benchmark runs with various different build configurations. The results are here. I did a second run with exactly the same configurations. The results look pretty similar.

It looks like LTO (thin or fat) results in significant variation of timings. Combining strip=true with either fat or thin LTO affects the result. Regular release builds, builds with codegen-units=1 and stripped builds (with or without codegen-units=1) look pretty stable.

mati865 · 2026-01-31T11:49:36Z

Strip difference is confusing to me since it's done by the linker: https://github.com/rust-lang/rust/blob/1e9be1b77fe89d9757d6179973b2fc970c6e83b7/compiler/rustc_codegen_ssa/src/back/linker.rs#L744
So, I wouldn't expect any differences to originate from it if debuginfo is disabled.

davidlattimore · 2026-01-31T22:25:13Z

I agree it's weird. I think something else must be being affected by the strip setting. I just did some more tests. I captured save-dirs for how the linker was invoked with and without strip=true then changed them so that they both only did --strip-debug not --strip-all. The result was two binaries that still had a significant performance difference. So it seems that setting strip=true is affecting more than just the flags passed to the linker.

davidlattimore · 2026-01-31T23:06:36Z

It's possible it's just that different tracked flags cause a different SVH (strict version hash) which means the names of things passed to LLD are slightly different. Those differences in names might cause LLD to make different decisions. That's just a guess though.

davidlattimore · 2026-02-13T05:10:24Z

I tracked down the main performance loss for the linker-plugin change (with plugins disabled)... it turned out to be a bit of code that got deleted. The code wasn't even being run by the benchmark since it was in an error path. I then tried repeating the benchmark with codegen-units=1 and the performance loss went away. I think the way rust merges codegen units, although presumably deterministic, is not at all stable in the presence of code changes. So a small code change could affect which initial codegen units get merged with which others resulting in significant changes in performance in parts of the code unrelated to where the code was changed. So I'm leaning more towards the suggestion of setting codegen-units=1 for release builds.

This gives more stable performance. i.e. it reduces the extent to which small changes in code affect the performance of unrelated bits of code. This also makes the release be more or less what we release, which is useful if people `cargo install wild-linker. Faster (to build) optimised builds are now available with the opt profile.

lapla-cogito · 2026-02-14T05:22:33Z

-that enables compiler's internal ThinLTO and strips the binaries. The benefit from ThinLTO
-is very mild in Wild's case, so it's up to you whether to use it. Musl releases also enable
-`--feature mimalloc`, see below for the explanation.


Since ThinLTO isn't specified for any builds now, I think there's no need to remove the reason why it isn't enabled here (though I am curious about how much the benefits would change if FatLTO were used instead).

If I don't remove the bit about ThinLTO, then this would be saying that the dist profile uses it, which isn't true anymore.

lapla-cogito reviewed Jan 31, 2026

View reviewed changes

davidlattimore marked this pull request as draft February 8, 2026 04:07

davidlattimore force-pushed the push-qourowxvuxzu branch from 6ce73e6 to 3e3aa79 Compare February 13, 2026 05:24

davidlattimore changed the title ~~perf: Change release profile to use thin LTO~~ perf: Change release profile to use codegen-units=1 Feb 13, 2026

davidlattimore marked this pull request as ready for review February 13, 2026 10:55

lapla-cogito reviewed Feb 14, 2026

View reviewed changes

davidlattimore merged commit f5b25f1 into main Feb 23, 2026
20 checks passed

davidlattimore deleted the push-qourowxvuxzu branch February 23, 2026 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf: Change release profile to use codegen-units=1#1500

perf: Change release profile to use codegen-units=1#1500
davidlattimore merged 1 commit into
mainfrom
push-qourowxvuxzu

davidlattimore commented Jan 31, 2026 •

edited

Loading

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

lapla-cogito left a comment

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

mati865 commented Jan 31, 2026 •

edited

Loading

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

davidlattimore commented Feb 13, 2026

Uh oh!

lapla-cogito Feb 14, 2026

Uh oh!

davidlattimore Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Conversation

davidlattimore commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

lapla-cogito left a comment

Choose a reason for hiding this comment

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

mati865 commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

davidlattimore commented Jan 31, 2026

Uh oh!

davidlattimore commented Feb 13, 2026

Uh oh!

lapla-cogito Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

davidlattimore Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidlattimore commented Jan 31, 2026 •

edited

Loading

mati865 commented Jan 31, 2026 •

edited

Loading