Skip to content

perf: Rewrite scheduling of string merging tasks#1240

Merged
davidlattimore merged 1 commit into
mainfrom
push-pwmuvnpvxwyw
Oct 30, 2025
Merged

perf: Rewrite scheduling of string merging tasks#1240
davidlattimore merged 1 commit into
mainfrom
push-pwmuvnpvxwyw

Conversation

@davidlattimore

Copy link
Copy Markdown
Member

The old do_splitting_work, if there were lots of threads and insufficient work to do, could end up effectively busy-waiting, which was terrible for performance.

We now spawn a separate task for each bit of work. We also now constrain the number of split input section Vecs that we'll keep in memory.

Issue #1085

@davidlattimore

Copy link
Copy Markdown
Member Author

On my laptop at least, this is relatively performance-neutral. On benchmarks with very little by way of strings to merge, it seems to be an improvement. It now seems to not slow down when more threads are added. It doesn't support separately limiting the number of threads used for string merging. There are a few parameters that can, at least for the moment be experimentally tweaked by passing three numeric parameters to --wild-experiments=80,16,1024 (values here are the current defaults). The three values are, respectively, a multiplier that adjusts how many buffers are available for intermediate work, how much to split input sections and the minimum size in bytes for sections to be split. It's a bit of guesswork as to what would be good values for these. At least for me, I don't see a heap of difference when I adjust these values, but I also haven't tried anything especially scientific as yet. I'll probably do more experiments on this before I merge. I'm interested as to how this looks performance-wise for others.

@mati865

mati865 commented Oct 25, 2025

Copy link
Copy Markdown
Member

I don't have time to dive into it so posting just a quick run on Clang without debug info:

Details
Command Mean [ms] Min [ms] Max [ms] Relative
./run-with ~/Projects/wild/target/release/wild-base --threads=1 410.8 ± 3.2 407.2 416.3 5.53 ± 0.11
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=1 413.0 ± 4.6 406.1 420.6 5.56 ± 0.12
./run-with ~/Projects/wild/target/release/wild-base --threads=2 223.4 ± 2.4 220.7 229.3 3.01 ± 0.06
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=2 223.2 ± 4.8 213.7 234.6 3.00 ± 0.08
./run-with ~/Projects/wild/target/release/wild-base --threads=3 158.8 ± 1.5 156.0 161.1 2.14 ± 0.04
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=3 159.7 ± 1.8 157.1 162.9 2.15 ± 0.05
./run-with ~/Projects/wild/target/release/wild-base --threads=4 126.5 ± 3.7 117.8 132.7 1.70 ± 0.06
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=4 129.6 ± 4.3 118.4 133.9 1.74 ± 0.07
./run-with ~/Projects/wild/target/release/wild-base --threads=5 105.2 ± 5.4 98.2 114.0 1.42 ± 0.08
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=5 108.8 ± 4.0 99.9 114.1 1.46 ± 0.06
./run-with ~/Projects/wild/target/release/wild-base --threads=6 96.6 ± 4.9 87.7 102.5 1.30 ± 0.07
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=6 100.0 ± 1.5 96.1 102.6 1.35 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=7 91.4 ± 3.0 82.6 95.2 1.23 ± 0.05
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=7 90.1 ± 2.3 84.2 93.7 1.21 ± 0.04
./run-with ~/Projects/wild/target/release/wild-base --threads=8 86.9 ± 1.7 83.1 90.6 1.17 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=8 86.3 ± 1.1 84.0 88.4 1.16 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=9 82.4 ± 1.4 77.6 84.7 1.11 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=9 81.5 ± 1.4 78.3 84.3 1.10 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=10 79.1 ± 1.2 76.7 83.0 1.06 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=10 79.4 ± 1.1 77.1 82.7 1.07 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=11 77.2 ± 1.2 75.1 80.3 1.04 ± 0.02
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=11 78.0 ± 1.7 75.5 82.7 1.05 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=12 76.0 ± 1.0 74.7 78.7 1.02 ± 0.02
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=12 76.1 ± 0.9 74.4 77.9 1.02 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=13 75.6 ± 1.0 73.2 78.4 1.02 ± 0.02
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=13 75.3 ± 1.1 73.2 77.8 1.01 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=14 74.7 ± 0.9 72.8 78.1 1.01 ± 0.02
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=14 74.9 ± 1.2 72.7 77.8 1.01 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=15 75.0 ± 1.7 71.9 78.3 1.01 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=15 75.3 ± 1.1 73.5 79.2 1.01 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=16 75.0 ± 1.4 72.2 80.4 1.01 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=16 75.5 ± 1.4 72.9 79.6 1.02 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=17 74.8 ± 1.1 72.7 77.4 1.01 ± 0.02
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=17 76.0 ± 1.1 73.6 78.7 1.02 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=18 74.6 ± 1.1 72.4 77.0 1.00 ± 0.02
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=18 76.4 ± 1.5 73.5 80.1 1.03 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=19 74.3 ± 1.4 71.5 78.7 1.00
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=19 75.9 ± 1.2 73.9 79.1 1.02 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=20 75.3 ± 1.5 72.4 78.1 1.01 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=20 75.8 ± 1.3 73.2 79.8 1.02 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=21 75.1 ± 1.3 72.7 78.2 1.01 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=21 77.0 ± 1.1 74.9 80.2 1.04 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=22 75.1 ± 1.1 73.2 77.4 1.01 ± 0.02
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=22 77.0 ± 1.5 74.4 79.9 1.04 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=23 75.0 ± 1.3 72.3 78.8 1.01 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=23 78.2 ± 1.2 75.7 81.2 1.05 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=24 75.7 ± 1.1 73.2 77.7 1.02 ± 0.02
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=24 78.9 ± 1.3 75.9 81.2 1.06 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=25 76.1 ± 1.3 73.2 78.8 1.02 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=25 78.3 ± 1.3 75.5 81.8 1.05 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=26 76.1 ± 1.5 72.4 79.3 1.02 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=26 79.6 ± 1.1 77.2 82.7 1.07 ± 0.02
./run-with ~/Projects/wild/target/release/wild-base --threads=27 76.4 ± 1.4 73.6 78.9 1.03 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=27 78.1 ± 1.3 76.1 82.8 1.05 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=28 76.9 ± 1.3 73.0 79.4 1.03 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=28 79.7 ± 1.6 77.5 84.4 1.07 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=29 77.1 ± 1.3 75.0 79.9 1.04 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=29 81.7 ± 1.1 79.4 84.7 1.10 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=30 77.7 ± 1.2 75.7 80.6 1.05 ± 0.02
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=30 83.1 ± 1.3 81.1 85.9 1.12 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=31 77.6 ± 1.6 75.2 81.2 1.04 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=31 81.5 ± 1.4 78.6 85.7 1.10 ± 0.03
./run-with ~/Projects/wild/target/release/wild-base --threads=32 79.0 ± 1.9 75.9 83.6 1.06 ± 0.03
./run-with ~/Projects/wild/target/release/wild-pwmu --threads=32 84.5 ± 1.4 82.3 88.1 1.14 ± 0.03

Comment thread libwild/src/string_merging.rs Outdated
}
});
}
let num_threads = rayon::current_num_threads();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving it to the default number of threads seems to be a rather big hit beyond 20 threads.
wild-pwmu is this PR and wild-pwmu2 is this PR with a hack:

Suggested change
let num_threads = rayon::current_num_threads();
let num_threads = args.numeric_experiment(Experiment::MergeStringCapMultiplier, 80) as usize
/ 10;

https://gist.github.com/mati865/e90546d1e9d31b880db57a92e0de84ce

@mati865

mati865 commented Oct 26, 2025

Copy link
Copy Markdown
Member

Some more benchmarks using Clang binary. Using multiple groups per thread seems to hurt (at least for Clang) and the slowdown scales: https://gist.github.com/mati865/8a85880857049f7b4f9ad5ce2843d22d

The old do_splitting_work, if there were lots of threads and insufficient work
to do, could end up effectively busy-waiting, which was terrible for
performance.

We now spawn a separate task for each bit of work. We also now constrain the
number of split input section Vecs that we'll keep in memory.

Issue #1085
@davidlattimore

Copy link
Copy Markdown
Member Author

Thanks for running those benchmarks! I ended up getting rid of the code that was targeting a particular number of groups. We now just have a constant byte size after which we'll split to a new group. This and various other tweaks that I made gave some further improvements to some benchmarks. I mostly benchmarked clang with debug info, which with 32 threads now shows about a 20% improvement for me. Other benchmarks, especially with a smaller number of threads, showed smaller improvements or in some cases no real change.

@mati865

mati865 commented Oct 29, 2025

Copy link
Copy Markdown
Member

Impressive!
Using Clang without debuginfo it's not only faster than the base, but also doesn't suffer from the scaling issues: https://gist.github.com/mati865/9e2ea19483981ad4d976f0a9ab147326

@davidlattimore davidlattimore merged commit ddf5b3c into main Oct 30, 2025
20 checks passed
@davidlattimore davidlattimore deleted the push-pwmuvnpvxwyw branch October 30, 2025 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants