Escape additonal classes of characters in escape_debug_ext#158286
Escape additonal classes of characters in escape_debug_ext#158286Jules-Bertholet wants to merge 3 commits into
escape_debug_ext#158286Conversation
|
If you want to modify |
|
r? @Darksonn rustbot has assigned @Darksonn. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
|
Is this intended to supersede #158057, or be merged before/after that one? |
28c54c1 to
2c433a5
Compare
2c433a5 to
f47ef3f
Compare
They are conceptually independent changes—that PR removes some characters from the list we escape, this PR adds some. I'll need to rebase in between them, though. |
(See also #158057, #155527)
This PR escapes a few additional categories of characters in
escape_debug_ext(used to implementchar::escape_debugand the variousDebugimpls for characters and strings):NFC_Quick_Check=Maybecharacters: Depending on context, these characters may sometimes appear in NFC-normalized text, or they may be removed by the application of NFC. Most characters in this category are also grapheme extenders, which we already escape, but there are a few that are not. By escaping these last as well, we ensure that any string which is not normalized according to NFC gets escaped; that seems useful for debugging normalization-related issues. (Note:NFC_Quick_Check=Nois equivalent toFull_Composition_Exclusion=Yes.)Additionally, the second commit makes the
DeprecatedandFull_Composition_Exclusiondata tables unstably public, behind theunicode_discouragedfeature gate, so that users don't need to ship duplicate copies of std data tables. Libs-api can feel free to reject if unsure.Making the
NFC_Quick_Check=Maybetable public is left to future work, because the API for that is more complicated (quick_checkreturns a 3-valued enum instead of a boolean, do we want to ship more normalization data alongside it, etc.). However, we do leave room for it by including the whole table, instead of optimizing the implementation by including only the diff fromGrapheme_Extend.Adds 403 bytes total to the Unicode data tables.
@rustbot label A-Unicode T-libs-api needs-fcp