[Optimization] Zero padding in typed copies to help LLVM merge stores#157690
[Optimization] Zero padding in typed copies to help LLVM merge stores#157690ChuanqiXu9 wants to merge 1 commit into
Conversation
This comment has been minimized.
This comment has been minimized.
Close rust-lang#157373 For a `repr(C)` struct with inner padding like: struct S { a: u16, b: u8, /* 1 byte pad */ c: u32 } `ptr::write` generates suboptimal code (3 stores) while `MaybeUninit::write` generates optimal code (1 wide store). Root cause: `ptr::write` lowers to a single alloca + memcpy. LLVM's SROA decomposes the memcpy into per-field stores but skips padding bytes (emitting `store undef`, which DSE removes), leaving a gap: store i16 0, ptr %dest ; a [0,2) store i8 0, ptr %dest+2 ; b [2,3) ; [3,4) gap — no store store i32 0, ptr %dest+4 ; c [4,8) The backend's store merging cannot combine stores across this gap. `MaybeUninit::write` avoids this because its deeper inlining chain produces multiple chained memcpys, which triggers SROA's integer promotion path (treating the whole alloca as i64) instead of per-field forwarding. Fix: after each typed memcpy, emit explicit `memset(0)` for padding gaps. This fills the hole so SROA + store merging produce: store i64 0, ptr %dest ; single wide store Constraints to avoid unnecessary overhead or miscompilation: - Only `FieldsShape::Arbitrary` (structs, not arrays/unions/primitives) - Only `Variants::Single` (not multi-variant enums, where gaps may hold other variants' data) - Only structs ≤ 16 bytes (larger structs cannot merge into a single wide store anyway)
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
r? rust-lang/codegen |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
[Optimization] Zero padding in typed copies to help LLVM merge stores
| // CHECK-LABEL: @via_ptr_write( | ||
| #[no_mangle] | ||
| pub fn via_ptr_write(dest: &mut MaybeUninit<InnerPadded>) { | ||
| let val = InnerPadded { a: 0, b: 0, c: 0 }; |
There was a problem hiding this comment.
| let val = InnerPadded { a: 0, b: 0, c: 0 }; | |
| let val = InnerPadded { a: 0, b: 1, c: 0 }; |
Could the new test case be merged into one store with your PR?
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (68b77d3): comparison URL. Overall result: ❌ regressions - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary 2.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesThis perf run didn't have relevant results for this metric. Binary sizeResults (primary 0.1%, secondary 0.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 518.98s -> 524.616s (1.09%) |
Close #157373
For a
repr(C)struct with inner padding like:ptr::writegenerates suboptimal code (3 stores) whileMaybeUninit::writegenerates optimal code (1 wide store).Root cause:
ptr::writelowers to a single alloca + memcpy. LLVM's SROA decomposes the memcpy into per-field stores but skips padding bytes (emittingstore undef, which DSE removes), leaving a gap:The backend's store merging cannot combine stores across this gap.
MaybeUninit::writeavoids this because its deeper inlining chain produces multiple chained memcpys, which triggers SROA's integer promotion path (treating the whole alloca as i64) instead of per-field forwarding.Fix: after each typed memcpy, emit explicit
memset(0)for padding gaps. This fills the hole so SROA + store merging produce:Constraints to avoid unnecessary overhead or miscompilation:
FieldsShape::Arbitrary(structs, not arrays/unions/primitives)Variants::Single(not multi-variant enums, where gaps may hold other variants' data)