Skip to content

feat: Support --compress-debug-sections#1881

Merged
davidlattimore merged 1 commit into
mainfrom
push-nssoyquwyzkk
May 4, 2026
Merged

feat: Support --compress-debug-sections#1881
davidlattimore merged 1 commit into
mainfrom
push-nssoyquwyzkk

Conversation

@davidlattimore

Copy link
Copy Markdown
Member

Fixes #493

@davidlattimore davidlattimore force-pushed the push-nssoyquwyzkk branch 4 times, most recently from 250d9dc to e21be4f Compare May 2, 2026 11:48
Comment thread libwild/src/args/elf.rs Outdated
Comment on lines +1113 to +1118
match value {
"zlib" => args.debug_compression_kind = Some(CompressionKind::Zlib),
"zstd" => args.debug_compression_kind = Some(CompressionKind::Zstd),
value => {
args.warn_unsupported(&format!("--compress-debug-sections={value}"))?;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match value {
"zlib" => args.debug_compression_kind = Some(CompressionKind::Zlib),
"zstd" => args.debug_compression_kind = Some(CompressionKind::Zstd),
value => {
args.warn_unsupported(&format!("--compress-debug-sections={value}"))?;
}
match value {
"none" => args.debug_compression_kind = None,
"zlib" => args.debug_compression_kind = Some(CompressionKind::Zlib),
"zstd" => args.debug_compression_kind = Some(CompressionKind::Zstd),
value => {
args.warn_unsupported(&format!("--compress-debug-sections={value}"))?;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what GH did with that suggestion rendering, but it only adds none branch.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done. I added a test for =none as well.

@mati865 mati865 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive. Thought the code would be messier, but also less "verbose".

Some old build of Clang release+debuginfo with the default -O0:

❯ OUT=/tmp/bin powerprofilesctl launch -p performance hyperfine -w 3 './run-with ~/Projects/wild/target/release/wild --compress-debug-sections=zlib' './run-with ~/Projects/wild/target/release/wild --compress-debug-sections=zstd' './run-with ~/Projects/wild/target/release/wild --compress-debug-sections=none'
Benchmark 1: ./run-with ~/Projects/wild/target/release/wild --compress-debug-sections=zlib
  Time (mean ± σ):      2.834 s ±  0.028 s    [User: 0.001 s, System: 0.001 s]
  Range (min … max):    2.803 s …  2.885 s    10 runs

Benchmark 2: ./run-with ~/Projects/wild/target/release/wild --compress-debug-sections=zstd
  Time (mean ± σ):      2.555 s ±  0.028 s    [User: 0.001 s, System: 0.001 s]
  Range (min … max):    2.514 s …  2.611 s    10 runs

Benchmark 3: ./run-with ~/Projects/wild/target/release/wild --compress-debug-sections=none
  Time (mean ± σ):      1.936 s ±  0.028 s    [User: 0.001 s, System: 0.001 s]
  Range (min … max):    1.888 s …  1.963 s    10 runs

Summary
  ./run-with ~/Projects/wild/target/release/wild --compress-debug-sections=none ran
    1.32 ± 0.02 times faster than ./run-with ~/Projects/wild/target/release/wild --compress-debug-sections=zstd
    1.46 ± 0.03 times faster than ./run-with ~/Projects/wild/target/release/wild --compress-debug-sections=zlib

And the sizes:

❯ ls /tmp/bin.*
.rwxr-xr-x 5,2G mateusz  2 maj 18:02  /tmp/bin.nocompress
.rwxr-xr-x 3,0G mateusz  2 maj 18:03  /tmp/bin.zlib
.rwxr-xr-x 2,5G mateusz  2 maj 18:03  /tmp/bin.zstd

Higher opt levels are nasty for linking time.

Just a question regarding compression levels.
With --compress-debug-sections=zstd LLD 22 is a bit faster than Wild (8.3s vs 8.6s), but produces slightly bigger binaries. Looks like there is a disagreement on compression levels?

❯ /bin/ls -l /tmp/bin.o*
-rwxr-xr-x 1 mateusz mateusz 2247017048 05-02 18:30 /tmp/bin.o3.ld
-rwxr-xr-x 1 mateusz mateusz 2299742096 05-02 18:28 /tmp/bin.o3.lld
-rwxr-xr-x 1 mateusz mateusz 2118940207 05-02 18:28 /tmp/bin.o3.wild

That's 2.093 GiB, 2,142 GiB and 1,973 GiB respectively. The difference seems to be coming mostly from .debug_info size.

@marxin marxin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good job!

@davidlattimore

Copy link
Copy Markdown
Member Author

I did some more experiments. It appears I might have been mistaken about -O levels affecting compression level. At least for lld. --compress-debug-sections doesn't let you specify a compression level, but --compress-sections (which wild doesn't yet support) does. So I did some experiments where I used --compress-sections to compress all the debug sections, then varied the -O level and also varied the compression level. It turned out, assuming I didn't make some mistake in my experiment methodology, that lld always uses level 1 for zlib and level 3 for zstd. So for not at least, I've removed the -O handling code and we now just unconditionally use the same values.

@mati865

mati865 commented May 4, 2026

Copy link
Copy Markdown
Member

Level 3 for zstd is the default one, so it makes sense: https://docs.rs/zstd/latest/zstd/constant.DEFAULT_COMPRESSION_LEVEL.html

Level 1 for zstd seems to match "best speed" profile: https://docs.rs/zlib-rs/0.6.3/zlib_rs/struct.DeflateConfig.html#method.best_speed with the value at: https://docs.rs/zlib-rs/0.6.3/src/zlib_rs/c_api.rs.html#151
I'm less sure about that one.

@davidlattimore

Copy link
Copy Markdown
Member Author

I converted the experiments I was doing before into a script:

#!/bin/bash
set -e
D=$HOME/tmp/comp-level-test
OPT=1
FLAGS=-O$OPT
RUN_WITH=$HOME/save/wild-debug/run-with
LINKER=ld.lld
COMP=zlib

mkdir -p $D
rm -f $D/*

for L in $(seq 0 9); do
    if [ $L -eq 0 ]; then
        X=$COMP
        LNAME=default
    else
        X=$COMP:$L
        LNAME=level-$L
    fi
    OUT=$D/${LINKER}.OPT${OPT}.${COMP}.$LNAME \
    $RUN_WITH $LINKER $FLAGS \
        "--compress-sections=.debug_loc=$X" \
        "--compress-sections=.debug_abbrev=$X" \
        "--compress-sections=.debug_info=$X" \
        "--compress-sections=.debug_aranges=$X" \
        "--compress-sections=.debug_ranges=$X" \
        "--compress-sections=.debug_str=$X" \
        "--compress-sections=.debug_line=$X"
done

OUT=$D/${LINKER}.OPT${OPT}.${COMP}.debug \
    $RUN_WITH $LINKER $FLAGS \
    --compress-debug-sections=$COMP

ls -l $D

If I run this, I get the output:

-rwxrwxr-x 1 david david 39067096 May  4 20:54 ld.lld.OPT1.zlib.debug
-rwxrwxr-x 1 david david 39067096 May  4 20:54 ld.lld.OPT1.zlib.default
-rwxrwxr-x 1 david david 39067096 May  4 20:54 ld.lld.OPT1.zlib.level-1
-rwxrwxr-x 1 david david 38541232 May  4 20:54 ld.lld.OPT1.zlib.level-2
-rwxrwxr-x 1 david david 38168280 May  4 20:54 ld.lld.OPT1.zlib.level-3
-rwxrwxr-x 1 david david 37463144 May  4 20:54 ld.lld.OPT1.zlib.level-4
-rwxrwxr-x 1 david david 37010096 May  4 20:54 ld.lld.OPT1.zlib.level-5
-rwxrwxr-x 1 david david 36755392 May  4 20:54 ld.lld.OPT1.zlib.level-6
-rwxrwxr-x 1 david david 36699568 May  4 20:54 ld.lld.OPT1.zlib.level-7
-rwxrwxr-x 1 david david 36660176 May  4 20:54 ld.lld.OPT1.zlib.level-8
-rwxrwxr-x 1 david david 36647768 May  4 20:54 ld.lld.OPT1.zlib.level-9

Note that the following three file sizes are exactly the same:

  • Output of --compress-debug-sections=zlib
  • Output when using --compress-sections= and listing each debug section, but not specifying level.
  • Output when using --compress-sections= and listing each debug section and setting level to 1.

If I change COMP to zstd, I get the following:

-rwxrwxr-x 1 david david 35579448 May  4 20:58 ld.lld.OPT1.zstd.debug
-rwxrwxr-x 1 david david 35579448 May  4 20:58 ld.lld.OPT1.zstd.default
-rwxrwxr-x 1 david david 37199216 May  4 20:58 ld.lld.OPT1.zstd.level-1
-rwxrwxr-x 1 david david 36406336 May  4 20:58 ld.lld.OPT1.zstd.level-2
-rwxrwxr-x 1 david david 35579448 May  4 20:58 ld.lld.OPT1.zstd.level-3
-rwxrwxr-x 1 david david 35514800 May  4 20:58 ld.lld.OPT1.zstd.level-4
-rwxrwxr-x 1 david david 34932424 May  4 20:58 ld.lld.OPT1.zstd.level-5
-rwxrwxr-x 1 david david 34519336 May  4 20:58 ld.lld.OPT1.zstd.level-6
-rwxrwxr-x 1 david david 34388736 May  4 20:58 ld.lld.OPT1.zstd.level-7
-rwxrwxr-x 1 david david 34225968 May  4 20:58 ld.lld.OPT1.zstd.level-8
-rwxrwxr-x 1 david david 34219744 May  4 20:58 ld.lld.OPT1.zstd.level-9

In this case it's the level-3 file that is equal to the debug and default files.

Changing the opt-level (as passed to -O) affects the file size, but doesn't seem to affect which level matches.

@davidlattimore

Copy link
Copy Markdown
Member Author

When you observed LLD being a tiny bit faster than Wild, was this on tmpfs or a different filesystem? I ask because I get quite different results:

Benchmark 1 (3 runs): /home/david/save/clang-debug/run-with env-rand ld.lld --compress-debug-sections=zstd
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.4s  ± 6.74ms    10.3s  … 10.4s           0 ( 0%)        0%
  peak_rss           26.7GB ± 5.52MB    26.7GB … 26.7GB          0 ( 0%)        0%
  cpu_cycles          162G  ± 91.7M      162G  …  162G           0 ( 0%)        0%
  instructions        346G  ± 6.55M      346G  …  346G           0 ( 0%)        0%
  cache_references   13.0G  ± 98.9M     12.9G  … 13.1G           0 ( 0%)        0%
  cache_misses       2.12G  ± 63.7M     2.04G  … 2.16G           0 ( 0%)        0%
  branch_misses       783M  ± 1.77M      781M  …  784M           0 ( 0%)        0%
Benchmark 2 (3 runs): /home/david/save/clang-debug/run-with env-rand mold --compress-debug-sections=zstd
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          5.16s  ± 24.6ms    5.14s  … 5.19s           0 ( 0%)        ⚡- 50.1% ±  0.4%
  peak_rss           8.70MB ± 39.8KB    8.66MB … 8.74MB          0 ( 0%)        ⚡-100.0% ±  0.0%
  cpu_cycles          273G  ± 1.02G      272G  …  274G           0 ( 0%)        💩+ 68.6% ±  1.0%
  instructions        299G  ± 23.2M      299G  …  299G           0 ( 0%)        ⚡- 13.7% ±  0.0%
  cache_references   13.8G  ± 5.30M     13.8G  … 13.8G           0 ( 0%)        💩+  6.2% ±  1.2%
  cache_misses       4.24G  ± 4.47M     4.24G  … 4.25G           0 ( 0%)        💩+100.5% ±  4.8%
  branch_misses       805M  ±  803K      804M  …  806M           0 ( 0%)        💩+  2.8% ±  0.4%
Benchmark 3 (3 runs): /home/david/save/clang-debug/run-with env-rand target/cg1/wild --compress-debug-sections=zstd
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          4.43s  ± 53.9ms    4.37s  … 4.48s           0 ( 0%)        ⚡- 57.2% ±  0.8%
  peak_rss           7.57MB ±  113KB    7.45MB … 7.66MB          0 ( 0%)        ⚡-100.0% ±  0.0%
  cpu_cycles          227G  ±  453M      227G  …  228G           0 ( 0%)        💩+ 40.4% ±  0.5%
  instructions        320G  ±  217M      320G  …  321G           0 ( 0%)        ⚡-  7.4% ±  0.1%
  cache_references   13.2G  ± 17.1M     13.2G  … 13.2G           0 ( 0%)          +  1.3% ±  1.2%
  cache_misses       4.54G  ± 5.81M     4.54G  … 4.55G           0 ( 0%)        💩+114.7% ±  4.8%
  branch_misses       807M  ± 1.82M      806M  …  809M           0 ( 0%)        💩+  3.0% ±  0.5%

That's with my latest changes, which switched to level 3 for zstd. I'm pretty sure I was using level 1 for zstd before, so it should have been even faster (about 3.5 seconds).

@mati865

mati865 commented May 4, 2026

Copy link
Copy Markdown
Member

Nice idea with that test. I'll leave it up to you whether you want to keep the compression level constants, or use the mechanisms from the crates. It'd be nicer if zlib crate provided public constant like zstd does.

When you observed LLD being a tiny bit faster than Wild, was this on tmpfs or a different filesystem?

Oops, that was with -O3 on tmpfs. So the previous size and performance differences make sense now.

@davidlattimore davidlattimore merged commit 6afbc31 into main May 4, 2026
24 checks passed
@davidlattimore davidlattimore deleted the push-nssoyquwyzkk branch May 4, 2026 12:25
@davidlattimore

Copy link
Copy Markdown
Member Author

I opted to stick with the constants that we define, since I felt that made it clearer exactly what level we were using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compressed debug sections

3 participants