Skip to content

libsyntax: be more accepting of whitespace in lexer#29734

Merged
bors merged 3 commits into
rust-lang:masterfrom
Ryman:whitespace_consistency
Mar 8, 2016
Merged

libsyntax: be more accepting of whitespace in lexer#29734
bors merged 3 commits into
rust-lang:masterfrom
Ryman:whitespace_consistency

Conversation

@Ryman

@Ryman Ryman commented Nov 10, 2015

Copy link
Copy Markdown
Contributor

Fixes #29590.

Perhaps this may need more thorough testing?

r? @Aatch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should leave the rest of this comment in. The function is more unicode-aware now, but presumably still doesn't do any normalisation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// This function is relatively Unicode-ignorant; fortunately, the careful design
/// of UTF-8 mitigates this ignorance. It doesn't do NKF-normalization(?).

Would be ok for you? Or just the normalization sentence?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks fine to me.

@Aatch

Aatch commented Nov 10, 2015

Copy link
Copy Markdown
Contributor

Test looks fine to me. r=me with the nits addressed.

@Ryman Ryman force-pushed the whitespace_consistency branch from 0680cc9 to de89895 Compare November 10, 2015 03:18
@Aatch

Aatch commented Nov 10, 2015

Copy link
Copy Markdown
Contributor

@bors r+ de89895

@eefriedman

Copy link
Copy Markdown
Contributor

This is technically a new feature... probably doesn't need a full RFC, but shouldn't it at least be discussed a bit?

@Ryman

Ryman commented Nov 11, 2015

Copy link
Copy Markdown
Contributor Author

Should we r- bors if there are concerns?

cc @nikomatsakis (who suggested accepting all whitespace in #29590)

@eefriedman

Copy link
Copy Markdown
Contributor

I guess I don't really have any specific concerns; it's not like we could interpret a character of class whitespace as anything other than whitespace or an error. We could end up with a sort of confusing parse with ZWNJs and other characters like that, but I don't see any reason not to allow users to inflict that on themselves.

@eefriedman

Copy link
Copy Markdown
Contributor

Err, actually, correction, I do have a specific objection: UAX #31 specifically recommends not doing this. We should be using the Pattern_White_Space property, not White_Space.

@Aatch

Aatch commented Nov 11, 2015

Copy link
Copy Markdown
Contributor

@bors r-

@Aatch

Aatch commented Nov 11, 2015

Copy link
Copy Markdown
Contributor

Sorry, I guess I got a little too zealous there.

@nikomatsakis

Copy link
Copy Markdown
Contributor

My bad. I should have known better than to think anything about unicode was a black-and-white decision!

@Ryman Ryman force-pushed the whitespace_consistency branch from de89895 to 23d4302 Compare November 12, 2015 04:25
@Ryman

Ryman commented Nov 12, 2015

Copy link
Copy Markdown
Contributor Author

I've updated the branch to accept only PATTERN_WHITE_SPACE, I've assumed exposing more from librustc_unicode is fine due to it being unstable.

The github diff doesn't seem to display the testfile too well, it shows as the below in my local git which shows the test more clearly:

-    assert_eq!(4 +  7 * 2
-
+assert_eq!(4^L+

-/ 3 * 2 , 4 + 7 * 2 / 3 * 2);
+7   * 2<U+0085>/<U+200E>3<U+200F>*<U+2028>2<U+2029>, 4 + 7 * 2 / 3 * 2);
 }```

@bors

bors commented Dec 6, 2015

Copy link
Copy Markdown
Collaborator

☔ The latest upstream changes (presumably #30187) made this pull request unmergeable. Please resolve the merge conflicts.

@steveklabnik

Copy link
Copy Markdown
Contributor

What's the status of this PR?

@Ryman

Ryman commented Jan 1, 2016

Copy link
Copy Markdown
Contributor Author

@steveklabnik Needs some feedback on if the changes are now ok, perhaps this change still needs more discussion though? Can rebase after that!

Comment thread src/libsyntax/util/parser_testing.rs Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the closure needed? Wouldn't a_iter.all(is_pattern_whitespace) work just fine?

ranma42 added a commit to ranma42/rust that referenced this pull request Jan 4, 2016
As mentioned in rust-lang#29734, the range comparison closure can be improved.

The LLVM IR and the assembly from the new version are much simpler and
unfortunately we cannot rely on the compiler to optimise this much, as
it would need to know that `lo <= hi`.

Besides from simpler code, there might also be a performance
advantage, although it is unlikely to appear on benchmarks, as we are
doing a binary search, which should always involve few comparisons.

The code is available on the playpen for ease of comparison:
http://is.gd/4raMmH
@nikomatsakis

Copy link
Copy Markdown
Contributor

@Ryman so it now conforms with UAX #31 (as cited by @eefriedman)?

@Ryman

Ryman commented Jan 7, 2016

Copy link
Copy Markdown
Contributor Author

@nikomatsakis I think it partially does, from the guideline:

Pattern_White_Space and Pattern_Syntax Characters: To meet this requirement, an implementation shall use Pattern_White_Space characters as all and only those characters interpreted as whitespace in parsing, and shall use Pattern_Syntax characters as all and only those characters with syntactic use. ...

Note: When meeting this requirement, all characters except those that have the Pattern_White_Space or Pattern_Syntax properties are available for use as identifiers or literals.

This PR doesn't change anything about the acceptance of Pattern_Syntax, so I don't think the final note is true, but it does bring us into alignment with the whitespace requirements of the guideline. (unless I've mis-implemented!)

@nikomatsakis

Copy link
Copy Markdown
Contributor

@Ryman great!

bors added a commit that referenced this pull request Jan 12, 2016
and the associated update of tables.rs

The last commit is related to my comment to #29734.
This aligns with unicode recommendations and should be stable for all future
unicode releases. See http://unicode.org/reports/tr31/#R3.

This renames `libsyntax::lexer::is_whitespace` to `is_pattern_whitespace`
so potentially breaks users of libsyntax.
@Ryman Ryman force-pushed the whitespace_consistency branch from 23d4302 to 24578e0 Compare January 16, 2016 00:57
@Ryman

Ryman commented Jan 16, 2016

Copy link
Copy Markdown
Contributor Author

Rebased, I think I got all of your suggestions accounted for @ranma42 :)

@ranma42

ranma42 commented Jan 16, 2016

Copy link
Copy Markdown
Contributor

Yes, thank you! :)

@steveklabnik

Copy link
Copy Markdown
Contributor

Triage: it seems this PR has been updated with review and is still siting here almost a month later, anyone willing to r+?

@alexcrichton

Copy link
Copy Markdown
Member

r? @nikomatsakis

was this intended to be r+'d? just confirming...

@alexcrichton alexcrichton assigned nikomatsakis and unassigned Aatch Mar 8, 2016
@Aatch

Aatch commented Mar 8, 2016

Copy link
Copy Markdown
Contributor

@bors r+ 24578e0

@Aatch

Aatch commented Mar 8, 2016

Copy link
Copy Markdown
Contributor

Woops, looks like this managed to slip by.

@bors

bors commented Mar 8, 2016

Copy link
Copy Markdown
Collaborator

⌛ Testing commit 24578e0 with merge 8b7c3f2...

bors added a commit that referenced this pull request Mar 8, 2016
libsyntax: be more accepting of whitespace in lexer

Fixes #29590.

Perhaps this may need more thorough testing?

r? @Aatch
@bors bors merged commit 24578e0 into rust-lang:master Mar 8, 2016
durka added a commit to durka/rust that referenced this pull request Jun 15, 2016
steveklabnik added a commit to steveklabnik/rust that referenced this pull request Jun 27, 2016
steveklabnik added a commit to steveklabnik/rust that referenced this pull request Jun 27, 2016
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jun 28, 2016
dlrobertson pushed a commit to dlrobertson/rust that referenced this pull request Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants