Skip to content

fix(lexer): use is_asscii_whitespace to find first not space pos#108403

Closed
bvanjoi wants to merge 1 commit into
rust-lang:masterfrom
bvanjoi:fix-issue-108275
Closed

fix(lexer): use is_asscii_whitespace to find first not space pos#108403
bvanjoi wants to merge 1 commit into
rust-lang:masterfrom
bvanjoi:fix-issue-108275

Conversation

@bvanjoi

@bvanjoi bvanjoi commented Feb 23, 2023

Copy link
Copy Markdown
Contributor

close #108275

@rustbot

rustbot commented Feb 23, 2023

Copy link
Copy Markdown
Collaborator

r? @eholk

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 23, 2023

@WaffleLapkin WaffleLapkin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #108275 is about the diagnostic output and not the fact that not all ascii whitespace characters are skipped. Are we sure we want to skip more characters?

@bvanjoi

bvanjoi commented Feb 24, 2023

Copy link
Copy Markdown
Contributor Author

I think #108275 is about the diagnostic output and not the fact that not all ascii whitespace characters are skipped. Are we sure we want to skip more characters?

Maybe the \0xc character should be skipped, at least from the naming of the function skip_ascii_whitespace, the change is as expected.

@eholk

eholk commented Mar 8, 2023

Copy link
Copy Markdown
Contributor

This change looks reasonable to me, but I'd love to get someone from @rust-lang/lang to weigh in. It's not clear whether we've implemented the behavior we want and described it imprecisely, or described the behavior we want but implemented it incorrectly. I'd lean towards the latter (and that's the position this PR takes), but I don't feel qualified to declare that on my own.

@bvanjoi

bvanjoi commented Mar 8, 2023

Copy link
Copy Markdown
Contributor Author

Seeing that it is the code implemented in #60261, perhaps we can ask @matklad or @petrochenkov to give some advice

@matklad

matklad commented Mar 8, 2023

Copy link
Copy Markdown
Contributor

My gut feeling is that only the error message is wrong here. |b| b != b' ' && b != b'\t' && b != b'\n' && b != b'\r' is the definition of "whitespace in strings", this is documented in the reference (https://doc.rust-lang.org/reference/tokens.html#string-literals).

It might be argued that the language definition is wrong here, and that we should adjust that. It indeed feels wrong that we use different definitions of whitespace when lexing tokens and when processing innards of a string. But, given that changing this would be a backwards incompatible change, I am 0.7 sure that the outcome here would be "this is a documented peculiarity".

@matklad

matklad commented Mar 8, 2023

Copy link
Copy Markdown
Contributor

This change looks reasonable to me
or described the behavior we want but implemented it incorrectly.

FWIW, the high order bit here is that this changes the visible semantics of the language. At this point in Rust's lifetime, "changes behavior" should take precedence over "reasonableness".

@bvanjoi

bvanjoi commented Mar 8, 2023

Copy link
Copy Markdown
Contributor Author

Very great explanation, I will give a PR with only the error message modified in the near future

@WaffleLapkin

Copy link
Copy Markdown
Member

Closing, since the consensus seems to be that we only need to change the diagnostic.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 10, 2023
fix(lexer): print whitespace warning for \x0c

- close rust-lang#108275
- discussion: rust-lang#108403
@bvanjoi bvanjoi deleted the fix-issue-108275 branch April 3, 2023 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

string continue non-ASCII whitespace confusing as \x0C is ASCII whitespace

5 participants