MIR InstrumentCoverage - Can spans for `TerminatorKind::Goto` be improved to avoid special cases? 

Rust's LLVM InstrProf-based source code coverage implementation instruments Rust code via the MIR pass `InstrumentCoverage`. Most criteria for identifying coverage regions and counter locations are very general, based on Control Flow Graph (CFG) analysis of the MIR, and a fairly straightforward mapping of MIR `Statement`s and `Terminator`s to their source code regions (`Span`s). 

`TerminatorKind::Goto`s are an exception, requiring special handling.

This issue is created to highlight some of the unique requirements and issues addressed in the current coverage implementation, in case someone has ideas for improving things, to reduce the reliance on the `Goto`-specific logic, either by improving `InstrumentCoverage` if something was overlooked, or improving the `Goto` representation (such as refining its `Span` representation, or providing additional context that `InstrumentCoverage` might leverage).

 ### Current State

One of the first steps in the `InstrumentCoverage` process is to extract relevant code `Span`s from the MIR `Statement`s and `Terminator`s. (These `Span`s are later combined into sets of sequential statements and with contiguous source code regions that can be counted via a single counter; i.e., if any statement in the set was executed, all statements in the same set would also have been executed.)

`bcb_to_initial_coverage_spans()` iterates through the `BasicBlock`s of the `CoverageGraph` (a subset of the MIR, essentially skipping panic/unwind paths), and their `Statement`s and `Terminator`s. Some `Statement`s and `Terminator`s are relevant to `Coverage`, and others are not. The `Statement` and `Terminator` filtering is handled by `filtered_statement_span()` and `filtered_terminator_span()`, respectively.

In almost all cases, if not filtered out, the initial coverage `Span` contributed by either a `Statement` or a `Terminator` is the `source_info.span` (within the function body) of the `Statement` or `Terminator`; because, in most cases, the source code span carried forward from the parsed source to its MIR representation is a fairly accurate mapping from intent to execution.

For example, `filtered_terminator_span()` uses the entire `source_info.span` for the following `TerminatorKind`s:

```rust
fn filtered_terminator_span(terminator: &'a Terminator<'tcx>, body_span: Span) -> Option<Span> {
    match terminator.kind {
...
        // Retain spans from all other terminators
        TerminatorKind::Resume
        | TerminatorKind::Abort
        | TerminatorKind::Return
        | TerminatorKind::Call { .. }
        | TerminatorKind::Yield { .. }
        | TerminatorKind::GeneratorDrop
        | TerminatorKind::FalseUnwind { .. }
        | TerminatorKind::InlineAsm { .. } => {
            Some(function_source_span(terminator.source_info.span, body_span))
        }
```

All other `TerminatorKind`s are filtered out, except for `Goto`.

`Goto` terminators play an important role in the control flow, so they cannot be filtered out, but their `source_info.span` typically includes the `Span`s of the statements that precede it, making the `Span` redundant, in most cases.

    One example: `Goto`s are often the targets of `SwitchInt` branches, and certain important optimizations to replace some `Counter`s with `Expression`s require a separate `BasicCoverageBlock` for each branch, to support the `Counter`, when needed.

Since a `Goto`-based `CoverageSpan` still needs a span to indicate if a region of actual source code was executed or not, the span returned from `filtered_terminator_span()`, for `Goto`s, is an empty span, positioned at the `Goto` span's last byte position:

```rust
TerminatorKind::Goto { .. } => {
    Some(function_source_span(terminator.source_info.span.shrink_to_hi(), body_span))
}
```

This byte position can--most often--be leveraged to contribute to a `CoverageSpan` for certain execution branches.

For example, an `if` block without an `else` shows the block was executed if the condition was `true`, but there would be no way to indicate coverage (or lack thereof) of the `false` branch without using the associated `Goto`s `hi()` byte position (which is expanded by one character to the left, for a non-empty `CoverageSpan`.

However, in other cases, a visible `CoverageSpan` is not wanted, but the `Goto`
block must still be counted (for example, to contribute its count to an `Expression`
that reports the execution count for some other block). In these cases, the code region
is set to `None`.

This decision (whether to include a one-character coverage span for a `Goto` or to count a `Goto` block without a code region) is handled in `inject_coverage_span_counters()`, beginning with the call to `is_code_region_redundant()`, which encapsulates the decision on how to handle these special cases.

At the time of this writing, the decision criteria is only looking for `Goto` terminators with spans that end at the last byte position in the file, because these `Goto` spans--if present--are redundant with the spans from every function's final `Return` terminator. When they are present, they can cause the function's last line to appear to have been executed twice, when it was only executed once.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MIR InstrumentCoverage - Can spans for `TerminatorKind::Goto` be improved to avoid special cases? #78542

Current State

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

MIR InstrumentCoverage - Can spans for TerminatorKind::Goto be improved to avoid special cases? #78542

Description

Current State

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

MIR InstrumentCoverage - Can spans for `TerminatorKind::Goto` be improved to avoid special cases? #78542