Skip to content

seq: Performance optimization#9557

Open
FidelSch wants to merge 4 commits into
uutils:mainfrom
FidelSch:seq-optimization
Open

seq: Performance optimization#9557
FidelSch wants to merge 4 commits into
uutils:mainfrom
FidelSch:seq-optimization

Conversation

@FidelSch

@FidelSch FidelSch commented Dec 3, 2025

Copy link
Copy Markdown
Contributor

Reading #6182 I noticed that most of the time spent running cargo run seq 4e4000003 4e4000003 was just BigUint::to_string()

Reading the code I found it is being called twice, once to get the actual string representation of the first number, and again on the last number, but only to get its length; and discarding the actual result.
This seems fine on fairly small numbers, but its efficiency degrades significantly on larger ones.

In my machine, this change resulted in a ~2x speedup on the mentioned case, and a marginal but seemingly better performance on smaller cases.

$ ~/coreutils$ hyperfine -L seq target/release/coreutils,target/release/coreutils_old "{seq} seq 4e4000003 4e4000003" 
Benchmark 1: target/release/coreutils seq 4e4000003 4e4000003
  Time (mean ± σ):     26.009 s ±  0.113 s    [User: 25.992 s, System: 0.015 s]
  Range (min … max):   25.909 s … 26.294 s    10 runs
 
Benchmark 2: target/release/coreutils_old seq 4e4000003 4e4000003
  Time (mean ± σ):     52.372 s ±  0.446 s    [User: 52.352 s, System: 0.017 s]
  Range (min … max):   51.815 s … 53.017 s    10 runs
 
Summary
  'target/release/coreutils seq 4e4000003 4e4000003' ran
    2.01 ± 0.02 times faster than 'target/release/coreutils_old seq 4e4000003 4e4000003'

@github-actions

github-actions Bot commented Dec 3, 2025

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@ChrisDryden

Copy link
Copy Markdown
Collaborator

Would be great to add your example to the benchmarks

@sylvestre

Copy link
Copy Markdown
Contributor

Would be great to add your example to the benchmarks

In a separate pr please :)

@anastygnome anastygnome left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not using

n.checked_ilog10().unwrap_or(0) + 1

Which should be available?

@FidelSch

FidelSch commented Dec 4, 2025

Copy link
Copy Markdown
Contributor Author

Any reason for not using

n.checked_ilog10().unwrap_or(0) + 1

Seemed unnecessary given it is just a constant. If there is any benefit to this alternative I am happy to refactor.

@codspeed-hq

codspeed-hq Bot commented Dec 6, 2025

Copy link
Copy Markdown

Merging this PR will improve performance by 58.82%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
✅ 281 untouched benchmarks
⏩ 38 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation seq_large_integers 2.1 ms 1.4 ms +58.82%
Memory seq_large_integers 52.8 KB 46.1 KB +14.49%

Comparing FidelSch:seq-optimization (43b0ca5) with main (bb91a5b)

Open in CodSpeed

Footnotes

  1. 38 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@sylvestre

Copy link
Copy Markdown
Contributor

any idea why tsort_input_parsing_heavy regressed ?

@github-actions

github-actions Bot commented Dec 6, 2025

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/tty/tty-eof is no longer failing!

@github-actions

github-actions Bot commented Dec 6, 2025

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@anastygnome

Copy link
Copy Markdown
Contributor

Any reason for not using

n.checked_ilog10().unwrap_or(0) + 1

Seemed unnecessary given it is just a constant. If there is any benefit to this alternative I am happy to refactor.

If I recall correctly the performance is similar to your solution and it's part of the language. Could you try sending a commit to trigger the benchmarks with this solution instead?

@sylvestre

Copy link
Copy Markdown
Contributor

note, upstream fails this way - what do you think we should do here?

$ LANG=C /usr/bin/seq "4e4000003" "4e4000003"
seq: invalid floating point argument: '4e4000003'
Try '/usr/bin/seq --help' for more information.

@FidelSch

Copy link
Copy Markdown
Contributor Author

After some digging, the maximum value seq accepts seems to be 11e4931, equivalent to the maximum value representable by an 80-bit long double; which supports the theory that GNU uses this type for their implementation.
By using BigDecimal we are significantly extending the representable range, so it makes sense that for huge numbers a direct comparison to seq is not viable.

@sylvestre

Copy link
Copy Markdown
Contributor

After some digging, the maximum value seq accepts seems to be 11e4931, equivalent to the maximum value representable by an 80-bit long double; which supports the theory that GNU uses this type for their implementation. By using BigDecimal we are significantly extending the representable range, so it makes sense that for huge numbers a direct comparison to seq is not viable.

ok, could you please document this in docs/src/extensions.md ? thanks

@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/pr/bounded-memory. tests/pr/bounded-memory is passing on 'main'. Maybe you have to rebase?

@sylvestre

Copy link
Copy Markdown
Contributor

@FidelSch a few changes seem to be unrelated, no ?

@FidelSch

Copy link
Copy Markdown
Contributor Author

Ah, they seem to have been introduced by my markdown formatter, did not notice until now. Should I roll them back?

@sylvestre

Copy link
Copy Markdown
Contributor

yes please
only the relevant changes

@FidelSch FidelSch force-pushed the seq-optimization branch 4 times, most recently from 02ac2b2 to f1c2b37 Compare February 11, 2026 13:37
Clarify seq output accuracy and value range limitations compared to GNU coreutils.
@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/pr/bounded-memory. tests/pr/bounded-memory is passing on 'main'. Maybe you have to rebase?

@sylvestre sylvestre left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization idea is sound — avoiding to_string() on large numbers is a real win.

However, the bits() / LOG2_10 approximation can be off by 1 for some values (e.g., exact powers of 10). Since this is used for padding width, being off by one character could produce misaligned output. Have you verified against the GNU test suite?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants