[Optimization] Zero padding in typed copies to help LLVM merge stores by ChuanqiXu9 · Pull Request #157690 · rust-lang/rust

ChuanqiXu9 · 2026-06-10T02:56:38Z

For a repr(C) struct with inner padding like:

struct S { a: u16, b: u8, /* 1 byte pad */ c: u32 }

ptr::write generates suboptimal code (3 stores) while MaybeUninit::write generates optimal code (1 wide store).

Root cause: ptr::write lowers to a single alloca + memcpy. LLVM's SROA decomposes the memcpy into per-field stores but skips padding bytes (emitting store undef, which DSE removes), leaving a gap:

store i16 0, ptr %dest        ; a [0,2)
store i8  0, ptr %dest+2      ; b [2,3)
                               ;   [3,4) gap — no store
store i32 0, ptr %dest+4      ; c [4,8)

The backend's store merging cannot combine stores across this gap.

MaybeUninit::write avoids this because its deeper inlining chain produces multiple chained memcpys, which triggers SROA's integer promotion path (treating the whole alloca as i64) instead of per-field forwarding.

Fix: after each typed memcpy, emit explicit memset(0) for padding gaps. This fills the hole so SROA + store merging produce:

store i64 0, ptr %dest        ; single wide store

Constraints to avoid unnecessary overhead or miscompilation:

Only FieldsShape::Arbitrary (structs, not arrays/unions/primitives)
Only Variants::Single (not multi-variant enums, where gaps may hold other variants' data)
Only structs ≤ 16 bytes (larger structs cannot merge into a single wide store anyway)

Close rust-lang#157373 For a `repr(C)` struct with inner padding like: struct S { a: u16, b: u8, /* 1 byte pad */ c: u32 } `ptr::write` generates suboptimal code (3 stores) while `MaybeUninit::write` generates optimal code (1 wide store). Root cause: `ptr::write` lowers to a single alloca + memcpy. LLVM's SROA decomposes the memcpy into per-field stores but skips padding bytes (emitting `store undef`, which DSE removes), leaving a gap: store i16 0, ptr %dest ; a [0,2) store i8 0, ptr %dest+2 ; b [2,3) ; [3,4) gap — no store store i32 0, ptr %dest+4 ; c [4,8) The backend's store merging cannot combine stores across this gap. `MaybeUninit::write` avoids this because its deeper inlining chain produces multiple chained memcpys, which triggers SROA's integer promotion path (treating the whole alloca as i64) instead of per-field forwarding. Fix: after each typed memcpy, emit explicit `memset(0)` for padding gaps. This fills the hole so SROA + store merging produce: store i64 0, ptr %dest ; single wide store Constraints to avoid unnecessary overhead or miscompilation: - Only `FieldsShape::Arbitrary` (structs, not arrays/unions/primitives) - Only `Variants::Single` (not multi-variant enums, where gaps may hold other variants' data) - Only structs ≤ 16 bytes (larger structs cannot merge into a single wide store anyway)

rust-log-analyzer · 2026-06-10T05:46:47Z

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

[TIMING:end] builder::Libdir { compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu } -- 0.001
##[group]Building stage1 tidy (stage0 -> stage1, x86_64-unknown-linux-gnu)
error: process didn't exit successfully: `sccache /checkout/obj/build/bootstrap/debug/rustc -vV` (exit status: 2)
--- stderr
sccache: error: Timed out waiting for server startup. Maybe the remote service is unreachable?
Run with SCCACHE_LOG=debug SCCACHE_NO_DAEMON=1 to get more information

Bootstrap failed while executing `test src/tools/tidy tidyselftest --extra-checks=py,cpp,js,spellcheck`
Build completed unsuccessfully in 0:00:31
  local time: Wed Jun 10 05:46:29 UTC 2026
  network time: Wed, 10 Jun 2026 05:46:29 GMT

ChuanqiXu9 · 2026-06-12T03:09:00Z

r? rust-lang/codegen

dianqk · 2026-06-12T03:52:35Z

@bors try @rust-timer queue

[Optimization] Zero padding in typed copies to help LLVM merge stores

dianqk

I think we should leave this to LLVM. Maybe we can emit store undef in SROA and let instcombine or memcpyopt consume it.

View changes since this review

dianqk · 2026-06-12T05:06:07Z

+// CHECK-LABEL: @via_ptr_write(
+#[no_mangle]
+pub fn via_ptr_write(dest: &mut MaybeUninit<InnerPadded>) {
+    let val = InnerPadded { a: 0, b: 0, c: 0 };


Suggested change

let val = InnerPadded { a: 0, b: 0, c: 0 };

let val = InnerPadded { a: 0, b: 1, c: 0 };

Could the new test case be merged into one store with your PR?

rust-bors · 2026-06-12T06:01:05Z

☀️ Try build successful (CI)
Build commit: 68b77d3 (68b77d32f1786d41b84fa6dfba36f5c3809b1f9f, parent: b30f3df3ba3c4c9de2f58f1a75dd9500b79b3f8d)

rust-timer · 2026-06-12T06:41:59Z

Finished benchmarking commit (68b77d3): comparison URL.

Overall result: ❌ regressions - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.2%, 0.6%]	16
Regressions ❌ (secondary)	0.4%	[0.2%, 0.6%]	10
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[0.2%, 0.6%]	16

Max RSS (memory usage)

Results (secondary 2.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.3%	[2.3%, 2.3%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

This perf run didn't have relevant results for this metric.

Binary size

Results (primary 0.1%, secondary 0.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.2%]	22
Regressions ❌ (secondary)	0.3%	[0.0%, 0.4%]	8
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[0.0%, 0.2%]	22

Bootstrap: 518.98s -> 524.616s (1.09%)
Artifact size: 400.92 MiB -> 400.88 MiB (-0.01%)

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 10, 2026

This comment has been minimized.

Sign in to view

ChuanqiXu9 force-pushed the PaddingZero branch from 0c29089 to de5e31b Compare June 10, 2026 05:42

rustbot assigned dianqk Jun 12, 2026

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 12, 2026

This comment has been minimized.

Sign in to view

rust-bors Bot pushed a commit that referenced this pull request Jun 12, 2026

Auto merge of #157690 - ChuanqiXu9:PaddingZero, r=<try>

68b77d3

[Optimization] Zero padding in typed copies to help LLVM merge stores

dianqk reviewed Jun 12, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Optimization] Zero padding in typed copies to help LLVM merge stores#157690

[Optimization] Zero padding in typed copies to help LLVM merge stores#157690
ChuanqiXu9 wants to merge 1 commit into
rust-lang:mainfrom
ChuanqiXu9:PaddingZero

ChuanqiXu9 commented Jun 10, 2026

Uh oh!

This comment has been minimized.

rust-log-analyzer commented Jun 10, 2026

Uh oh!

ChuanqiXu9 commented Jun 12, 2026

Uh oh!

dianqk commented Jun 12, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

dianqk left a comment •

edited by rustbot

Loading

Uh oh!

dianqk Jun 12, 2026

Uh oh!

rust-bors Bot commented Jun 12, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	let val = InnerPadded { a: 0, b: 0, c: 0 };
	let val = InnerPadded { a: 0, b: 1, c: 0 };

Uh oh!

Conversation

ChuanqiXu9 commented Jun 10, 2026

Uh oh!

This comment has been minimized.

rust-log-analyzer commented Jun 10, 2026

Uh oh!

ChuanqiXu9 commented Jun 12, 2026

Uh oh!

dianqk commented Jun 12, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

dianqk left a comment • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dianqk Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

rust-bors Bot commented Jun 12, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Jun 12, 2026

Overall result: ❌ regressions - please read:

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dianqk left a comment •

edited by rustbot

Loading