Rewrite parser as recursive descent by binji · Pull Request #591 · WebAssembly/wabt

binji · 2017-08-11T23:08:06Z

Remove Bison dependency
Remove pre-generated parser files
Rename build config from no-re2c-bison to no-re2c
Add a simple make_unique implementation
Move handling of module bindings into ir.cc
Simplify lexer
- Remove lookahead, the parser handles this now
- Unify Token/LexerToken, it only contains terminal values now
- Refactor setting token type and value into one function (e.g.
  LITERAL, RETURN => RETURN_LITERAL)
New Parser
- Uses two tokens of lookahead (use Peek(), PeekAfter())
- Consume() consumes one token of any kind
- Match(t) consumes the current token if it matches
- PeekMatch(t) returns true iff the token matches, but doesn't consume
- Basic error synchronization; plenty of room for improvement here

* Remove Bison dependency * Remove pre-generated parser files * Rename build config from no-re2c-bison to no-re2c * Add a simple make_unique implementation * Move handling of module bindings into ir.cc * Simplify lexer - Remove lookahead, the parser handles this now - Unify Token/LexerToken, it only contains terminal values now - Refactor setting token type and value into one function (e.g. LITERAL, RETURN => RETURN_LITERAL) * New Parser - Uses two tokens of lookahead (use Peek(), PeekAfter()) - Consume() consumes one token of any kind - Match(t) consumes the current token if it matches - PeekMatch(t) returns true iff the token matches, but doesn't consume - Basic error synchronization; plenty of room for improvement here

jayphelps · 2017-08-13T03:47:15Z

 (foo bar)
 (;; STDERR ;;;
-out/test/parse/bad-toplevel.txt:2:2: error: unexpected token "foo"
+out/test/parse/bad-toplevel.txt:2:1: error: unexpected token "(", expected a module field or a command.


I know this is a first pass conversion (and it's awesome) but this one jumped out at me as confusing. Perhaps when an invalid field is discovered it consumes the Lpar before erroring so the more descriptive/accurate token is given?

Good call; this is handled in a few other places but I think I must have missed it here. 👍

It's nicer to consume the ( so the error points to the unknown word instead.

KarlSchimpf

LGTM.

I have a few nits (on style/generality). However, this CL is very large (and time consuming to review). If you want to fix the nits in a later CL, that is fine.

KarlSchimpf · 2017-08-15T17:11:27Z

-    }                                          \
+#define FILL(n)              \
+  do {                       \
+    if (Failed(Fill((n)))) { \


What if there were lookahead tokens? In such a case you would return eof too early.

Never mind. I realize now that lookahead is handled in the parser.

I moved the lookahead tokens to the parser, so the lexer can always just return the immediate value. Seemed more natural to me this way, since only the parser actually cares about being able to look ahead.

KarlSchimpf · 2017-08-15T17:12:59Z

-  Token lval_;
-};
+const char* GetTokenTypeName(TokenType token_type) {
+  const char* s_names[] = {


Shouldn't this be static (i.e. initialized only once)?

KarlSchimpf · 2017-08-15T17:22:26Z

+}
+
+bool WastParser::MatchLpar(TokenType type) {
+  if (PeekMatchLpar(type)) {


Why not a call to check if PeekAfter() == type?

That's what PeekMatchLpar does, but it also checks that Peek() is TokenType::Lpar

KarlSchimpf · 2017-08-15T17:26:42Z

+Result WastParser::ParseQuotedText(std::string* text) {
+  WABT_TRACE(ParseQuotedText);
+  if (!PeekMatch(TokenType::Text))
+    return ErrorExpected({"a quoted string"}, "\"foo\"");


This is just for example text. Maybe not needed, but I thought it made the errors a little nicer:

error: unexpected token "(", expected a quoted string (e.g. "foo").

Verified. Ok.

KarlSchimpf · 2017-08-15T17:33:31Z

+  return GetToken().loc;
+}
+
+TokenType WastParser::Peek() {


Why doesn't this check if there available tokens?

I assume this is because you are assuming that multiple lookahead only happens in very restricted contexts (such as "(" "token"). If you are making this assumption, make it more clear.

Another choice would be to replace Peek() and PeekAfter() with:

TokenType WasmParser::Peek(size_t n = 0) {
while (tokens_.size() <= n)
tokens_.push_back(lexer_->GetToken(this);
return tokens_.at(n).,token_type;
}

This solution would allow more general peek behavior.

I thought about this, but since we only need two tokens of lookahead, I thought it would be better to make that clear by not making it general. Guess I should add some more documentation about all this.

While I agree that you only need two tokens of look-ahead, the current implementation was making some larger assumptions, such as you are always assuming that at least one token of look-ahead is being maintained (my solution removes this restriction).

I prefer a solution that doesn't have implicit assumptions because they are ALWAYS harder to maintain. In most contexts, an extension will come along that violates it, and then it is much harder to convince yourself what places need to be fixed. I understand that the Peek(0) (or simply Peek()) that I suggest does add an additional test (to see if a token needs to be retrieved), but I don't see that as a major performance cost.

I also like only having one "Peek" method, but that is my feeling.

OK, I'll give it a shot :-)

KarlSchimpf · 2017-08-15T17:36:19Z

+      return Result::Error; \
+  } while (0)
+
+#define EXPECT(token_type) CHECK_RESULT(Expect(TokenType::token_type))


Why not make this a function and let the compiler inline it?

Similarly, create a "check()" function

void check(Token_type ty) { CHECK_RESULT(expr); }

and

void ExpectCheck(TokenType ty) { CHECK_RESULT(Expect(ty)); }

I made this a macro mostly because of the magic automatic control flow (since CHECK_RESULT will return on failure)

Good point. I didn't read it carefully enough. current form is acceptable.

* Add TokenTypePair for convenience when dealing with two tokens * Some additional function documentation too

KarlSchimpf

LGTM.

I like it!

binji requested review from KarlSchimpf and sbc100 August 11, 2017 23:48

jayphelps reviewed Aug 13, 2017

View reviewed changes

binji added 3 commits August 13, 2017 09:00

Handle a top-level mispelling better

c77a9a7

It's nicer to consume the ( so the error points to the unknown word instead.

Add ConsumeIfLpar, switch some errors to asserts

b31e75e

Fix memory leak in Parse{Plain,Block}Instr

9e1f6ce

KarlSchimpf approved these changes Aug 15, 2017

View reviewed changes

Make Peek function more general, remove PeekAfter

1772f81

* Add TokenTypePair for convenience when dealing with two tokens * Some additional function documentation too

KarlSchimpf approved these changes Aug 15, 2017

View reviewed changes

binji merged commit 3d3920f into master Aug 15, 2017

binji deleted the parser branch August 15, 2017 21:36

sbc100 mentioned this pull request Dec 13, 2021

Fix syntax for assert_return typecheck test #1782

Merged

sbc100 mentioned this pull request Apr 6, 2026

Remove debug-parser option and other unused variables #2739

Merged

Conversation

binji commented Aug 11, 2017

Uh oh!

jayphelps Aug 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KarlSchimpf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KarlSchimpf left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jayphelps Aug 13, 2017 •

edited

Loading