Skip to content

Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496

Open
asder8215 wants to merge 8 commits into
rust-lang:mainfrom
asder8215:components_rewrite
Open

Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496
asder8215 wants to merge 8 commits into
rust-lang:mainfrom
asder8215:components_rewrite

Conversation

@asder8215

@asder8215 asder8215 commented May 12, 2026

Copy link
Copy Markdown
Contributor

View all comments

This PR entirely changes how Components<'_> is implemented. Currently, the Components<'_> iterator 'consumes' components through mutating its path field to a subslice that presents the left over unconsumed path components (this consumed path component is what's returned in Components::next or Components::next_back). However, this PR keeps the path field alive/unmodified and uses front and back indexing strategy to extract consumed/unconsumed components.

This PR benefits implementations like Components::as_path, which is pretty used is multiple areas of the standard library. Previously, Components<'_> iterator was required to clone inside the function to present the unconsumed path because our original Component<'_> consuming behavior on path will not allow the returned &'a Path from Components::as_path to last after a Components::next or Components::next_back call. Due to the current implementation of Components iterator has a size of 64 bytes, if you're using Components::as_path after each Components::next/Components::next_back, then it's pretty unfortunate to be cloning 64 bytes again and again, especially if each of your path components are a few bytes (e.g., "foo/bar/baz").

On the point of size, with the indexing strategy, this PR has further optimized the size of Components<'_> from 64 bytes -> 40 bytes since a large chunk of the Components<'_> was taken up by the Option<Prefix> (this takes up 40 bytes). Instead of holding a prefix field in Components<'_>, we can encode the length of the Prefix within our front field index and use another enum called FirstComponent to check whether our first component of the given path is Prefix (or something else). If it's a Prefix, we can use parse_prefix on the subslice self.path[..self.front] since we know our front index encodes the Prefix length.

Due to not having the prefix Option<Prefix> field inside Components<'_> anymore, all the prefix functions in Components<'_> have been removed in favor of calling parse_prefix, Prefix::is_verbatim, Prefix::is_drive, etc.

I'm curious if this redesign of Components<'_> improves Path equality as pointed out by @clarfonthey in #154521 with Path equality (not to be confused with Path ordering as mentioned in the issue, since that uses Components:::compare_components and the example code shows equality) being slow. I haven't benchmarked this though. I have benchmarked the result and I can say that currently this implementation improves Path equality due to Components::next_back running faster with this implementation than the current mutating path with a subslice implementation. However, Path ordering runs slightly slower. You can check the benchmark code I used here, and play around with the number of bytes in a component, the number of components, etc..

Right now, when I tested it locally on my PC (Fedora OS), it passed all the standard library tests and rust analyzer didn't crash on me (had a few crash reports coming from rust analyzer early on when I messed around with Components<'_> dealing something with threads using Path::components, but now that's all resolved). I have not tested this on Windows yet, and I would probably need someone to help me test on this platform as my Windows VM is not working properly to run the standard library test suite.

There's a lot of things being done here, and possibly there may be better approaches or ways I could improve this implementation or write the code in a neater way here. I am open to any advice or feedback on this approach.

Update: I got to testing some things out with Prefixes on my Windows VM manually, so the prefix component index encoded into the Components<'_> front field seems to work out nicely. I've also accounted for root directory being able to exists after a Prefix component like "\?\checkout\src\tools" having the following components: PrefixVerbatim -> RootDir -> Normal -> Normal -> None (learnt this from the fail that occurred in miri tests, which is nice to see this Components<'_> implementation works on the Windows tests in CI).

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 12, 2026
@rustbot

rustbot commented May 12, 2026

Copy link
Copy Markdown
Collaborator

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @ChrisDenton, libs
  • @ChrisDenton, libs expanded to 8 candidates

@rustbot

This comment has been minimized.

@asder8215 asder8215 force-pushed the components_rewrite branch from 1627e2f to 33e69e1 Compare May 12, 2026 09:09
@rustbot

rustbot commented May 12, 2026

Copy link
Copy Markdown
Collaborator

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rust-log-analyzer

This comment has been minimized.

@asder8215 asder8215 force-pushed the components_rewrite branch from 33e69e1 to ed9d33d Compare May 12, 2026 17:05
@rust-log-analyzer

This comment has been minimized.

@asder8215 asder8215 force-pushed the components_rewrite branch from ed9d33d to 0b0f84c Compare May 12, 2026 17:19
@rust-log-analyzer

This comment has been minimized.

… of mutating and subslicing path field; as a result, Components iterator memory size goes from 64 bytes to 40 bytes and as_path does not use cloning at all
@asder8215 asder8215 force-pushed the components_rewrite branch from 0b0f84c to 8ed33ea Compare May 12, 2026 22:05
@asder8215

This comment was marked as outdated.

@asder8215 asder8215 force-pushed the components_rewrite branch from 2151b8f to 83cdbed Compare May 13, 2026 22:21
@asder8215

This comment was marked as outdated.

…ity, added safety comments, and check for root dir after Prefix component (e.g., '\\?\checkout\src\tools' should produce Prefix, RootDir, Normal, Normal, None, ...) in Components::parse_single_component
@asder8215 asder8215 force-pushed the components_rewrite branch from 83cdbed to 3921fff Compare May 15, 2026 00:30
@asder8215 asder8215 marked this pull request as draft May 16, 2026 12:22
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 16, 2026
asder8215 added 2 commits May 16, 2026 19:56
…here to use iter().position()/.iter().rposition(), refactored code in compare_components, and removed stale comments
@asder8215 asder8215 force-pushed the components_rewrite branch from 0a25dda to 92e0132 Compare May 17, 2026 16:09
@asder8215

asder8215 commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

New benchmarking results. You can see what the benchmark code looks like here and run it yourself to see if there are any difference in measurements on your end:

This is the measurement of the current implementation of Components<'_> (without black box):

Std Components (No BB)  time:   [21.546 µs 21.800 µs 22.096 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Std Components Next (No BB)
                        time:   [20.434 µs 20.482 µs 20.538 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

Std Components Next Back (No BB)
                        time:   [38.367 µs 38.757 µs 39.199 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Std Path Iter (No BB)   time:   [21.547 µs 21.730 µs 21.921 µs]

Std As Path Iter (No BB)
                        time:   [87.680 µs 88.439 µs 89.231 µs]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

Std Eq Comps (No BB)    time:   [591.21 ns 593.35 ns 595.82 ns]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe

Std Uneq Comps (No BB)  time:   [60.953 ns 61.419 ns 61.911 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Std Uneq 2 Comps (No BB)
                        time:   [75.454 µs 75.734 µs 76.027 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Std Compare Comps (No BB)
                        time:   [46.182 µs 46.621 µs 47.192 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Std Compare Uneq Comps (No BB)
                        time:   [46.679 µs 46.980 µs 47.291 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Std Compare Uneq 2 Comps (No BB)
                        time:   [41.480 ns 41.827 ns 42.160 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

This is the measurement of the new implementation of Components<'_> I'm working on (without black box):

Components Rewrite (No BB)
                        time:   [24.982 µs 25.267 µs 25.570 µs]

Components Next Rewrite (No BB)
                        time:   [24.388 µs 24.655 µs 24.937 µs]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

Components Next Back Rewrite (No BB)
                        time:   [18.184 µs 18.567 µs 19.034 µs]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

Path Iter Rewrite (No BB)
                        time:   [23.485 µs 23.659 µs 23.829 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

As Path Iter Rewrite (No BB)
                        time:   [22.936 µs 23.066 µs 23.208 µs]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Eq Comps Rewrite (No BB)
                        time:   [605.12 ns 608.83 ns 612.98 ns]
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe

Uneq Comps Rewrite (No BB)
                        time:   [31.799 ns 32.108 ns 32.433 ns]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Uneq Comps 2 Rewrite (No BB)
                        time:   [47.091 µs 48.186 µs 49.085 µs]

Compare Comps Rewrite (No BB)
                        time:   [50.234 µs 50.725 µs 51.254 µs]
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) high mild
  1 (1.00%) high severe

Compare Uneq Comps Rewrite (No BB)
                        time:   [49.262 µs 49.631 µs 50.067 µs]
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) high mild
  12 (12.00%) high severe

Compare Uneq Comps 2 Rewrite (No BB)
                        time:   [43.397 ns 43.767 ns 44.171 ns]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Edit: Updated Components::as_path to match on Option<FirstComponent>/self.first_comp instead of using if let Some(_) = self.first_comp and matching on that, benchmarking for this PR Components<'_> has been updated as a result. Everything else is unaffected by this change.

@asder8215

asder8215 commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

Here are the benchmark results with black box:

From current Components<'_> implementation:

Std Components          time:   [20.947 µs 21.010 µs 21.084 µs]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Std Components Next     time:   [20.967 µs 20.993 µs 21.021 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Std Components Next Back
                        time:   [35.715 µs 35.802 µs 35.925 µs]
Found 20 outliers among 100 measurements (20.00%)
  6 (6.00%) high mild
  14 (14.00%) high severe

Std Path Iter           time:   [20.883 µs 20.992 µs 21.152 µs]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Std As Path Iter        time:   [80.673 µs 80.935 µs 81.261 µs]
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

Std Eq Comps            time:   [589.43 ns 593.36 ns 597.88 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high severe

Std Uneq Comps          time:   [63.919 ns 64.262 ns 64.765 ns]
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

Std Uneq 2 Comps        time:   [75.284 µs 75.939 µs 76.599 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

From this Components<'_> implementation PR:

Components Rewrite      time:   [24.190 µs 24.425 µs 24.687 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Components Next Rewrite time:   [24.230 µs 24.550 µs 24.889 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Components Next Back Rewrite
                        time:   [17.339 µs 17.488 µs 17.655 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Path Iter Rewrite       time:   [23.845 µs 23.996 µs 24.154 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

As Path Iter Rewrite    time:   [22.431 µs 22.676 µs 23.010 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Eq Comps Rewrite        time:   [586.16 ns 588.10 ns 590.14 ns]

Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

Uneq Comps Rewrite      time:   [31.733 ns 32.023 ns 32.378 ns]

Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

Uneq 2 Comps Rewrite    time:   [36.318 µs 36.574 µs 36.913 µs]
Found 23 outliers among 100 measurements (23.00%)
  23 (23.00%) high severe

Edit: Updated Components::as_path to match on Option<FirstComponent>/self.first_comp instead of using if let Some(_) = self.first_comp and matching on that, benchmarking for this PR Components<'_> has been updated as a result. Everything else is unaffected by this change.

Edit 2: Took off Path ordering benchmark here since it was incorrect see below to see corrected path ordering benchmarks.

@asder8215 asder8215 force-pushed the components_rewrite branch from 92e0132 to 574d7f2 Compare May 17, 2026 18:41
@asder8215 asder8215 marked this pull request as ready for review May 17, 2026 18:59
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 17, 2026
@clarfonthey

Copy link
Copy Markdown
Contributor

Yeah, I suspect that probably the best solution would be to do something similar to what Python does and offer some sort of PosixPath / WindowsPath types instead of them all being supported under Path, but that seems a little ahead of the game here.

I'll take any wins if the code ends up working better.

@asder8215

asder8215 commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

I realized my benchmarking for path ordering is incorrect; I thought the cmp function would use the PartialOrd impl of Components<'_>, but it uses the Iterator::cmp (I forgot that it uses that; will update that soon to use either > or <). That being said, I think I've got an idea to preserve some of the previous code in Components::compare_components, which should bring the performance to be the same or similar.

@asder8215

asder8215 commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

New benchmarks for path ordering comparisons (BB abbrev for Black Box):

Compare Comps Rewrite   
                        time:   [13.882 µs 13.942 µs 14.037 µs]
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

Compare Uneq Comps Rewrite
                        time:   [14.475 µs 14.641 µs 14.831 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Compare Uneq 2 Comps Rewrite
                        time:   [41.087 ns 41.521 ns 41.973 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Compare Comps Rewrite (No BB)
                        time:   [14.077 µs 14.152 µs 14.238 µs]
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

Compare Uneq Comps Rewrite (No BB)
                        time:   [14.023 µs 14.032 µs 14.042 µs]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

Compare Uneq Comps 2 Rewrite (No BB)
                        time:   [39.542 ns 39.735 ns 39.950 ns]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

Std Compare Comps       
                        time:   [13.667 µs 13.690 µs 13.716 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Std Compare Uneq Comps  
                        time:   [13.694 µs 13.709 µs 13.726 µs]
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

Std Compare Uneq 2 Comps
                        time:   [40.555 ns 40.650 ns 40.758 ns]
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

Std Compare Comps (No BB)
                        time:   [13.738 µs 13.779 µs 13.827 µs]
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

Std Compare Uneq Comps (No BB)
                        time:   [14.011 µs 14.134 µs 14.255 µs]
Found 11 outliers among 100 measurements (11.00%)
  10 (10.00%) high mild
  1 (1.00%) high severe

Std Compare Uneq 2 Comps (No BB)
                        time:   [41.148 ns 41.277 ns 41.430 ns]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mildCompare Comps Rewrite   time:   [13.882 µs 13.942 µs 14.037 µs]
  3 (3.00%) high severe

Performance is nearly the same using what the current implementation of Components<'_> did (though tweaking it to use front index from Components<'_>).

Edit: Had to correct the fast path None match condition (should be matching at back field since that encodes the length of the path we've subsliced); I noticed that it couldn't optimize the fast path well if I used left.back/right.back in both None matches, but it was able to optimize it if I use a variable containing left.back/right.back in one of the None matches. Updated benchmarking for compare cases as a result (others are unaffected by this change because they don't rely on comparison operators like <, >).

…e in previous implementation, but making it work with Components<'_> front index
@asder8215 asder8215 force-pushed the components_rewrite branch from cb82f61 to 1a25002 Compare May 23, 2026 15:34
Comment thread library/std/src/path.rs Outdated
// causes this function to run slower than using a variable that stores
// the `left.back` and `right.back` information (which `back` field
// encodes the length of the `Components<'_>` unconsumed path)
let left_back = left.back;

@asder8215 asder8215 May 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarfonthey I had to make an update to the None matching condition code here because the length that needed to be compared between the left and right Components<'_> are actually the back field not path.len() (could possibly make a mistake in the fast path on an existing Components<'_> that used Components::next_back). I updated the benchmarking code to reflect this change as well

However, I noticed a strange thing while benchmarking in that if I do:

None if left.back == right.back => { ... },
None => left.back.min(right.back),

This runs two times slower than me storing left.back and right.back in separate variables and using that in the default None condition. Alternatively, I use left_back and right_back variables in both None matching conditions, it also causes a 2x performance degradation. Does this performance degradation occur on your end if you use left.back and right.back in both None match (or left_back and right_back)? If so, do you happen to know why this occurs?

I have the godbolt link here, but I couldn't figure out what changed.

View changes since the review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this appears way more pronounced if you look at the generated MIR (post-opt):

For the "faster" case:
scope 1 {
    debug left_back => _7;
    let _8: usize;
    scope 2 {
        debug right_back => _8;
        let _9: usize;
        let _24: usize;
        scope 3 {
            debug first_difference => _9;
            scope 5 {
                debug previous_sep => _32;
                let _32: usize;
            }
        }
        scope 4 {
            debug diff => _24;
        }
    }
}
For the "slower" case:
scope 1 {
    debug first_difference => _17;
    scope 3 {
        debug previous_sep => _22;
        let _22: usize;
        scope 35 (inlined #[track_caller] core::slice::index::<impl Index<RangeTo<usize>> for [u8]>::index) {
            debug self => _31;
            debug ((index: RangeTo<usize>).0: usize) => _17;
            scope 36 (inlined #[track_caller] <RangeTo<usize> as SliceIndex<[u8]>>::index) {
                debug ((self: RangeTo<usize>).0: usize) => _17;
                debug slice => _31;
                scope 37 (inlined #[track_caller] <std::ops::Range<usize> as SliceIndex<[u8]>>::index) {
                    debug ((self: std::ops::Range<usize>).0: usize) => const 0_usize;
                    debug ((self: std::ops::Range<usize>).1: usize) => _17;
                    debug slice => _31;
                    debug new_len => _17;
                    let mut _62: bool;
                    let mut _63: usize;
                    let _64: *const [u8];
                    let mut _65: *const [u8];
                    let mut _66: !;
                    let mut _67: usize;
                    scope 38 (inlined core::num::<impl usize>::checked_sub) {
                        debug self => _17;
                        debug rhs => const 0_usize;
                        let mut _68: bool;
                    }
                    scope 39 (inlined core::slice::index::get_offset_len_noubcheck::<u8>) {
                        debug ptr => _31;
                        debug offset => const 0_usize;
                        debug len => _17;
                        let mut _69: *const u8;
                        scope 40 {
                            scope 41 {
                            }
                        }
                    }
                }
            }
        }
        scope 42 (inlined core::slice::<impl [u8]>::iter) {
            debug self => _64;
            scope 43 (inlined std::slice::Iter::<'_, u8>::new) {
                debug slice => _64;
                let mut _71: std::ptr::NonNull<[u8]>;
                let mut _73: *mut u8;
                let mut _74: *mut u8;
                scope 44 {
                    debug len => _17;
                    let _70: std::ptr::NonNull<u8>;
                    scope 45 {
                        debug ptr => _70;
                        let _72: *const u8;
                        scope 46 {
                            debug end_or_len => _72;
                        }
                        scope 50 (inlined std::ptr::without_provenance::<u8>) {
                            debug addr => _17;
                            scope 51 (inlined without_provenance_mut::<u8>) {
                            }
                        }
                        scope 52 (inlined NonNull::<u8>::as_ptr) {
                            debug self => _70;
                        }
                        scope 53 (inlined #[track_caller] std::ptr::mut_ptr::<impl *mut u8>::add) {
                            debug self => _74;
                            debug count => _17;
                        }
                    }
                    scope 47 (inlined NonNull::<[u8]>::from_ref) {
                        debug r => _64;
                        let mut _75: *const [u8];
                    }
                    scope 48 (inlined NonNull::<[u8]>::cast::<u8>) {
                        debug self => _71;
                        let mut _76: *mut u8;
                        let mut _77: *mut [u8];
                        scope 49 (inlined NonNull::<[u8]>::as_ptr) {
                        }
                    }
                }
            }
        }
    }
}

I have a feeling that this might have something to do with how match guards are generated, although this is genuinely very weird. Will bring up to some compiler folks on Zulip and see if they have any insights.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the Zulip thread in case you want to participate: #t-compiler/performance > Bindings change dramatically affecting generated MIR

For now, I would say to obviously go with whichever one gets better optimised, but it would be really interesting to figure out why this is being compiled so differently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slower case looks so horrendous; it really is strange to see how reusing the Components<'_> iterator field leads to this mess of generated code when it's just due to how you extract the back indices of each Components<'_> (via variables vs from the struct directly). I'm curious what goes on in match guard code generation.

I'll definitely keep my eyes on the Zulip thread, appreciate you linking it here!

@clarfonthey

Copy link
Copy Markdown
Contributor

Might as well:

r? @clarfonthey

For now since I have more or less agreed to review this. Will hand over to someone else if there are any additional things that need to be resolved that I can't/shouldn't handle.

Comment thread library/std/src/path.rs Outdated
fn prefix_verbatim(&self) -> bool {
if !HAS_PREFIXES {
return false;
fn consume_first_component(&mut self, dir_front: bool) -> Option<Component<'a>> {

@clarfonthey clarfonthey May 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, specifically since the first component appears to already be eagerly evaluated, I do wonder if this method is really necessary or if we should simply make the first component store Option<Component<'a>> directly. It does feel like a bit of extra work that could be cut out for simplicity, but I might be misreading.

View changes since the review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that my general opinion on eager evaluation for this kind of iterator is that if eager evaluation dramatically simplifies a majority of the cases that use this code, we can afford a little bit of eager evaluation as a treat, even if there are a few cases where something might be done that is ultimately discarded later.

@asder8215 asder8215 May 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also evaluate this function inside the Components::next and Components::next_back cases (it'll also get rid of the dir_front argument). I think I just did it earlier to write it somewhat of shared code in a neater way (although, they are not exactly shared since certain match conditions have different effects whether dir_front is true of false).

I'll change this and put the the code directly inside Components::next and Components::next_back and benchmark again to see if that affects anything.

@asder8215 asder8215 May 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With storing an Option<Component<'a>>, my concern was on increasing the memory size of Components<'_> and whether that would be worth it. I'm pretty sure one of the Component enum member takes in a Prefix, which that enum takes up 40 bytes. I also wanted to reduce the size of Components<'_> with this front and back index approach since I know cloning occurs in Components<'_> comparison (for equality or ordering).

I think that's one of the benefits I was trying to make with this front and back index approach. That this approach compresses the size penalty we take with storing Prefix enum into Components<'_> (it was like why use a Prefix enum that takes up 40 bytes, when we could use a usize that serves as an index marker on where our Prefix length ends?).

The other thing is that while FirstComponent::Absolute and FirstComponent::Prefix is evaluated already via has_root or parse_prefix, the relative path first component is not eagerly evaluated. I wasn't too worried about FirstComponent::Absolute, though it does suck to see FirstComponent::Prefix get evaluated again (especially if you have PrefixVerbatim and it's a pretty big component).

I'm okay with storing a Option<Component<'a>> here, but do we find the increase on Components<'_> iterator size acceptable?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually hadn't fully taken in how big the Prefix enum is, but… yeah.

That said, given the complexity of the operation, I still think that it might be better to still do everything ahead of time, minus maybe parsing a Prefix. This basically means that I think it might be better to effectively convert the iterator into a monomorphised version of Chain<MaybePrefix, Back>, where even if MaybePrefix does some extra parsing at runtime to fully expand the prefix, the iterator starts out in a form where you have definitively separated out the prefix and don't need any special casing besides either (doing whatever logic is required to output the prefix) or (doing whatever logic is required for everything else). Right now, because the front index is doing double-duty for both keeping track of the location of the prefix, and keeping track of the iteration position in the rest of the path, it might be better to instead duplicate the extra index if needed in the Option being stored just to simplify the logic.

Also, as far as size goes… unless it actively messes with codegen, I would say we should basically be assuming in all cases that people will be storing a Path, not a Components iterator, at least modulo any iterator adapters. If we still end up in the case where the iteration can't be inlined, it makes sense to optimise for this size, but hopefully these changes fix that.

@asder8215 asder8215 May 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, given the complexity of the operation, I still think that it might be better to still do everything ahead of time, minus maybe parsing a Prefix.

I agree with you there.

This basically means that I think it might be better to effectively convert the iterator into a monomorphised version of Chain<MaybePrefix, Back>, where even if MaybePrefix does some extra parsing at runtime to fully expand the prefix, the iterator starts out in a form where you have definitively separated out the prefix and don't need any special casing besides either.

I feel like I might need a bit of clarity on what you mean here.

However, I was thinking about something a bit simpler. I was thinking back to your point on storing an Option<Component<'a> into Components<'_>, and instead of an Option<Component<'a>, I was thinking we can add the PrefixComponent<'a> from Component<'a> as field of FirstComponent::Prefix. What's eagerly evaluated when creating a Components<'_> iterator is Prefix and whether we have an absolute path or not (which the latter should be trivial to compute, and in both cases, this would be trivial to compute on unix platforms since Prefix doesn't exist). The first component of a relative path is not eagerly evaluated and I don't think we need to do that if it's not necessary.

The benefit of just adding PrefixComponent<'a> into FirstComponent::Prefix instead of using Option<Component<'a>> is just that the size of the FirstComponent enum will be smaller as it doesn't add the Normal(&'a OsStr) enum member (and, a lesser issue, all the other enum members) that Component<'a> has (note: size argument isn't true, it's just less convoluted). This would mitigate the issue of re-parsing the Prefix occurring in this function (and elsewhere) as we can just take and move it out from Option<FirstComponent::Prefix> into Component::Prefix.

Right now, because the front index is doing double-duty for both keeping track of the location of the prefix, and keeping track of the iteration position in the rest of the path, it might be better to instead duplicate the extra index if needed in the Option being stored just to simplify the logic.

I think the front index is fine doing double-duty. By my previous suggestion, after parsing the Prefix and storing it inside FirstComponent::Prefix, we can still have the front index start at the length of the Prefix for the next component(s) it needs to parse.

@asder8215

Copy link
Copy Markdown
Contributor Author

@clarfonthey I tried out incorporating the FirstComponent::Prefix(PrefixComponent) in the benchmark code first. Every measurement seems to run fine, but then Path ordering measurements runs into the 2x performance degradation. Could you verify if that's what you see on your end?

I didn't see much of difference between the MIR code without storing PrefixComponent and with storing a PrefixComponent. I think the only difference I see are these lines:

With PrefixComponent:

let mut _33: std::option::Option<FirstComponent<'_>>;
...
let mut _36: std::option::Option<FirstComponent<'_>>;

Without PrefixComponent:

let mut _33: std::option::Option<FirstComponent>;
...
let mut _36: std::option::Option<FirstComponent>;

Does this have to do with how the size of the Components<'_> struct is 96 bytes with storing a PrefixComponent (from originally 40 bytes)?

Godbolt links attached:

Also note: I'm running on Fedora Linux. I have not benchmarked the code on Windows.

@asder8215

asder8215 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

@clarfonthey any note on the performance discrepancy with using FirstComponent::Prefix(PrefixComponent)?

@asder8215

Copy link
Copy Markdown
Contributor Author

@clarfonthey I was thinking about this for a while, but do you think it'd be possible to add in an internal unsafe parse_prefix_unchecked method that just constructs the prefix given a u8 slice (or &OsStr slice), two usize indices, and discriminant tag? We'd eliminate any comparison or churn in processing the Prefix component again. I also say two indices because the worst case Prefix we could have is an VerbatimUNC/UNC prefix path (which those enum member takes in two os string slices).

I think having that function would allow me to insert in one more usize field representing the length of the first os string slice for VerbatimUNC/UNC and a separate PrefixTag enum that contains the same enum members as Prefix without the field info; I could probably pack that into Components<'_> in 48 bytes (from 40 bytes) as follow:

enum PrefixTag {
    Verbatim,
    VerbatimUNC,
    VerbatimDisk,
    DeviceNS,
    UNC,
    Disk
}

enum FirstComponent {
    AbsolutePath,
    RelativePath,
    Prefix(PrefixTag),
}

pub struct Components<'a> {
path: &'a [u8],
// Probably a better name for this field, but this is only
// used for Prefix matching cases to construct a Prefix
// without the parsing process (only used if our Prefix
// is VerbatimUNC/UNC)
first_ind: usize,
front: usize,
back: usize,
has_physical_root: bool,
first_comp: Option<FirstComponent>,
}

@clarfonthey

clarfonthey commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

So, just adding some follow-ups from the libs meeting + my general thoughts on this. Been meaning to get to this review with a bit of a finer comb but haven't yet.

In general, the team is in favour of improving Path, but we've got loads of questions about the ultimate API design and those should probably go through an ACP, since that will likely shape the form the code takes going forward.

I personally think there are lots of potential wins here that can be done without any performance regressions. So, I think probably a hard line on accepting this PR is to ensure that there is a unilateral improvement across the board, which might limit what you're able to do from an API perspective. (side note: the perf runs are testing windows too, right?)

Additionally, any local testing or benchmarking you've managed to do are extremely good to document here, and can be at least brought to the table for future ACPs/perf improvements. Folks on the libs team were particularly interested in cargo runtime benchmarks, since we know cargo does a lot of operations on paths and can build up a hefty amount of overhead, and right now I believe that perf is mostly measuring crate performance, not the performance of cargo itself.

Also got some additional clarification from @the8472 that we shouldn't be running any branches for Windows prefixes on non-Windows platforms, but that I don't actually think this is the case. Going to poke around the docs later to see if we formally guarantee this, since I think we should, but don't actually think we do.

@asder8215

asder8215 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

In general, the team is in favour of improving Path, but we've got loads of questions about the ultimate API design and those should probably go through an ACP, since that will likely shape the form the code takes going forward.

I'm down to create an ACP for this. It may take me some time to write up a clean ACP draft, but I think the general idea from this is to switch strategy from mutating/subslicing path field directly to using an indexing strategy to grab a subslice of path.

I personally think there are lots of potential wins here that can be done without any performance regressions. So, I think probably a hard line on accepting this PR is to ensure that there is a unilateral improvement across the board, which might limit what you're able to do from an API perspective. (side note: the perf runs are testing windows too, right?)

I'll see what else I can do to get rid of perf regressions particularly on Components::next (Components::next_back and Components::as_path already performs better than the current implementation of Components). I think the hard part is that there is bits of extra logic coming from Components::next that Components::next_back doesn't have (due to front index is 0 indexing while back index is 1-indexing). For example:

impl<'a> DoubleEndedIterator for Components<'a> {
    fn next_back(&mut self) -> Option<Component<'a>> {
        // We reach here when we no longer have anymore paths
        // to consume, or we need to output Prefix component
        // (anything else falls through this conditional)
        if self.back <= self.front {
            return self.consume_first_component(false);
        }

        self.parse_next_back_component()
    }
}

impl<'a> Iterator for Components<'a> {
    type Item = Component<'a>;

    fn next(&mut self) -> Option<Component<'a>> {
        // We reach this case when we no longer have anymore paths
        // to consume (return `None`), or if our front idx was initially
        // equal to back idx (e.g. if we had `C:`, `.`, `/`), or if we
        // had a front component initially
        if self.front >= self.back || self.first_comp.is_some() {
            return self.consume_first_component(true);
        }
        self.parse_next_component()
    }
}

There's an extra check in Components::next to see if we have a first component to consume. It's necessary that we have an Option<FirstComponent> because both Components::next and Components::next_back stops trying to iterate when our front index has reached our back index (and vice versa); because we normalize the separators away in both direction, that causes concern for the root directory (if our back index reached our front index at 0 due to normalizing the path, note that our back still hasn't "consumed" the root directory component).

In an earlier comment, I mentioned that I ran the benchmarks on Fedora Linux. I could try to run the same benchmark tests on a Windows VM; however, I haven't tested how it works with paths with prefixes. I could also logically copy what parse_prefix is doing and test it on my current OS as well (though the former will be more accurate to benchmark how it works with an actual path with prefix). I suspect that paths with prefix might receive a slowdown due to one redundant parse_prefix call in the consume_first_component call, but I'll have to get back to you on that (also hence why I suggested a parse_prefix_unchecked function).

Additionally, any local testing or benchmarking you've managed to do are extremely good to document here, and can be at least brought to the table for future ACPs/perf improvements.

Noted. I could write that down in the ACP for sure.

Folks on the libs team were particularly interested in cargo runtime benchmarks, since we know cargo does a lot of operations on paths and can build up a hefty amount of overhead, and right now I believe that perf is mostly measuring crate performance, not the performance of cargo itself.

I can try to run cargo runtime benchmarks, but how do I go about doing that? Are there benchmark tests within the rust repo that I can run, or should I be cloning cargo and running benchmarks there?

Also got some additional clarification from @the8472 that we shouldn't be running any branches for Windows prefixes on non-Windows platforms, but that I don't actually think this is the case. Going to poke around the docs later to see if we formally guarantee this, since I think we should, but don't actually think we do.

I agree with @the8472, and I've been thinking about whether cfg gating the Prefix portion of the code within Components to Windows would be okay to do.

@clarfonthey

clarfonthey commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

So to be clear, we genuinely have no benchmarks that just run cargo, so, this is mostly just a "this is what we've been thinking about" thing. If you want to try building cargo and running it on a large project before/after your change as a sniff test that could be maybe useful, but I'm expecting that to not really be very fruitful. Not sure how rustc-timing works but maybe that helps.

Also to be clear on the ACP thing: specifically, ACPs are for API changes, so the idea would be to potentially propose some of the internal types you've been using like "Path known to be a Prefix, but unparsed" as part of public APIs to potentially be usable in other areas. There has also been some general sentiment in favour of having separate PosixPath / WindowsPath types like Python has right now. I also proposed having a RelativePath type which would remove prefixes entirely.

I want to propose RelativePath and maybe a dedicated PrefixedPath (note: C:dir is not absolute ;.;) or AbsolutePath type, but that's separate and really the main thing is about what public APIs would be involved that justify implementation changes.

@asder8215

asder8215 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

So to be clear, we genuinely have no benchmarks that just run cargo, so, this is mostly just a "this is what we've been thinking about" thing. If you want to try building cargo and running it on a large project before/after your change as a sniff test that could be maybe useful, but I'm expecting that to not really be very fruitful. Not sure how rustc-timing works but maybe that helps.

Got you. I'm pretty unfamiliar with accurately benchmarking a whole build, but I'll see what I can do.

On an adjacent note, I've been thinking about the performance degradation thing on Components::next, and I'm not actually too sure if that perf diff is actually a degradation. I'm rerunning things again with cargo bench on components_redesign repository I made (albeit I made a small optimization in separating the consume_first_component with two diff version on front and back, so I don't have that boolean argument) and it's now showing me that my implementation of Components::next is running slightly faster than the current std implementation of Components::next. I can't tell if the perf degradation before was due to some caching discrepancies or something.

Also to be clear on the ACP thing: specifically, ACPs are for API changes, so the idea would be to potentially propose some of the internal types you've been using like "Path known to be a Prefix, but unparsed" as part of public APIs to potentially be usable in other areas. There has also been some general sentiment in favour of having separate PosixPath / WindowsPath types like Python has right now. I also proposed having a RelativePath type which would remove prefixes entirely.

I want to propose RelativePath and maybe a dedicated PrefixedPath (note: C:dir is not absolute ;.;) or AbsolutePath type, but that's separate and really the main thing is about what public APIs would be involved that justify implementation changes.

That sounds good to me; in that case, I'll leave those ACP proposals to you because I think this PR isn't really introducing any new API changes and just modifying the implementation of an existing API.

@clarfonthey

clarfonthey commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

It's entirely possible that your latest version of the code got rid of the regressions, and based upon your work, I'm pretty hopeful it'll be possible to do the refactor without any regressions, although I do want to take a deeper look at the code and compare to the original before requesting another perf run. I also was going to take a look at some of the documented guarantees on Windows paths being parsed on Linux before I can fully clarify the status on the cfg'd code. My understanding is that Path is always only valid on the platform it's defined on, but that kind of runs in line with the idea of joining Components together. Needs to do more reading.

There is technically the chance for a brute-force approach, where you create a separate version of Components that genuinely returns a separate type where the variants are tailored for the target, then wrap Components in that iterator and just do the conversion. Then you can call that version of the iterator in places like PartialEq and PartialOrd where it won't be exposed for easier optimisation. I think maybe this isn't necessary right now, but mostly proposing it since I have a feeling it'll be part of a longer-term API proposal, e.g. having some extension traits where you return components specific to that target.

@asder8215

Copy link
Copy Markdown
Contributor Author

All good, take your time in looking at this and finding out what else could be done here. I pushed the changes I made from components_redesign with separating consume_first_component into 2 separate functions. I also introduced some aggressive inlining because I noticed the iterations that Criterion reported were higher (237k vs 232k), but the actual time measurements doesn't seem to be different, so can't really tell if aggressive inlining helped or not.

…nt_front and consume_first_component_back. Also introduced aggressive inlining.
@asder8215 asder8215 force-pushed the components_rewrite branch from 8585ad0 to 8111c2c Compare June 3, 2026 19:14
…comparison (normalizing paths if needed) and return Ordering Equal/Greater/Less if possible before needing to fall back on Iterator::cmp
@asder8215

Copy link
Copy Markdown
Contributor Author

Pushed another commit to optimize compare_components. This is basically the closest (if not the same) perf I have gotten with the current implementation of compare_components. In the fast path I compare byte by byte and check to see if the conflicting character can be used to return Ordering::Greater/Ordering::Less (or if not conflict seen, return Ordering::Greater/Ordering::Less/Ordering::Equal depending on remaining length of both left and right components); it also does normalization so we don't have to lean on Iterator::cmp all the time for Components::next to normalize components for us. Some areas of the code may look redundant, but it uses a different subslice range of the left Components<'_> and right Components<'_> path.

Benchmarks as follow:

This PR implementation:

Compare Comps Rewrite   time:   [13.774 µs 13.779 µs 13.785 µs]
Compare Uneq Comps Rewrite time:   [13.891 µs 13.894 µs 13.898 µs]
Compare Uneq 2 Comps Rewrite time:   [6.5223 ns 6.5372 ns 6.5611 ns]

The current std implementation:

Std Compare Comps       time:   [13.473 µs 13.479 µs 13.487 µs]
Std Compare Uneq Comps  time:   [13.504 µs 13.515 µs 13.528 µs]
Std Compare Uneq 2 Comps time:   [40.526 ns 40.873 ns 41.297 ns]

Do note that running cargo bench multiple times can sometime show this PR implementation of compare_components runs faster than the current std compare_components. Consistently the case where we have a path a that is "/a0..a64/a0..a64/..." and a path b that is "/b/{path_a}", it runs faster than the current implementation of compare_components most likely due to the fact that this PR implementation of compare_components can return Ordering::Greater/Ordering::Less on differing byte for that path a vs path b instead of falling down to Iterator::cmp in the current implementation.

@asder8215

asder8215 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

@clarfonthey I tried building cargo via ./x build src/tools/cargo --stage 2", dropped it in with the rustc binary built in stage 2, but it builds tokio in 17-18s. However, when I build the main rust repo rustc binary + cargo, it also builds tokio in 17-18s.

The thing that concerns me is that when I use my native cargo binary from my nightly rust toolchain, it builds tokio in 6-7s. Is there a certain optimized config to build cargo that's not made in --stage 2 builds?

@clarfonthey

clarfonthey commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

./x build --stage 2 still builds in the debug profile by default like regular cargo build, so, you have to add --release to actually get the optimisations you're looking for.

The idea is to mostly treat ./x as closely to normal cargo as possible, although the addition of stuff like --stage means it's not quite that, and just a best-effort thing. But stuff like --release is necessary here.

@asder8215

Copy link
Copy Markdown
Contributor Author

I built the library components with release flag turned on, and it decreased the cargo build time speed on tokio from 17-18s -> 10-11s. However, I'm also not sure if this perfectly replicates the optimizations that mycargo (which is linked to rustup run cargo for me) does. I say this because when I clean the target folder with a debug build of tokio, it removes 350 MiB of data in there while my rustup cargo removes 400 MiB of data. On release build of tokio, they both clean up 102.4 MiB; it's just kind of strange to see the 50 MiB difference from cargo clean.

Are there any differences to optimization settings between rustup cargo binaries and cargo binary built from the rust repo? If not, then I think a lot of concern might be towards Components::next being a core contributor to the slowdown.

@asder8215

Copy link
Copy Markdown
Contributor Author

I'll build the rust repo main branch (without my Components<'_> changes) on release mode with the same settings alongside cargo later today just to verify that it's not a problem with this PR. If they both produce the same timing or (even if rust repo produce a higher timing), then all I can say is that it's some other optimization config that I'm missing.

@asder8215

asder8215 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

@clarfonthey Yeah, I feel like I'm missing something. I just built the main branch of the rust repo on stage 2 with release profile (same thing with cargo) and it gives me the same/similar 10-11s timing results with cargo build with that toolchain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I-libs-nominated Nominated for discussion during a libs team meeting. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants