Conversation
* Remove Bison dependency
* Remove pre-generated parser files
* Rename build config from no-re2c-bison to no-re2c
* Add a simple make_unique implementation
* Move handling of module bindings into ir.cc
* Simplify lexer
- Remove lookahead, the parser handles this now
- Unify Token/LexerToken, it only contains terminal values now
- Refactor setting token type and value into one function (e.g.
LITERAL, RETURN => RETURN_LITERAL)
* New Parser
- Uses two tokens of lookahead (use Peek(), PeekAfter())
- Consume() consumes one token of any kind
- Match(t) consumes the current token if it matches
- PeekMatch(t) returns true iff the token matches, but doesn't consume
- Basic error synchronization; plenty of room for improvement here
| (foo bar) | ||
| (;; STDERR ;;; | ||
| out/test/parse/bad-toplevel.txt:2:2: error: unexpected token "foo" | ||
| out/test/parse/bad-toplevel.txt:2:1: error: unexpected token "(", expected a module field or a command. |
There was a problem hiding this comment.
I know this is a first pass conversion (and it's awesome) but this one jumped out at me as confusing. Perhaps when an invalid field is discovered it consumes the Lpar before erroring so the more descriptive/accurate token is given?
There was a problem hiding this comment.
Good call; this is handled in a few other places but I think I must have missed it here. 👍
It's nicer to consume the ( so the error points to the unknown word instead.
KarlSchimpf
left a comment
There was a problem hiding this comment.
LGTM.
I have a few nits (on style/generality). However, this CL is very large (and time consuming to review). If you want to fix the nits in a later CL, that is fine.
| } \ | ||
| #define FILL(n) \ | ||
| do { \ | ||
| if (Failed(Fill((n)))) { \ |
There was a problem hiding this comment.
What if there were lookahead tokens? In such a case you would return eof too early.
There was a problem hiding this comment.
Never mind. I realize now that lookahead is handled in the parser.
There was a problem hiding this comment.
I moved the lookahead tokens to the parser, so the lexer can always just return the immediate value. Seemed more natural to me this way, since only the parser actually cares about being able to look ahead.
| Token lval_; | ||
| }; | ||
| const char* GetTokenTypeName(TokenType token_type) { | ||
| const char* s_names[] = { |
There was a problem hiding this comment.
Shouldn't this be static (i.e. initialized only once)?
| } | ||
|
|
||
| bool WastParser::MatchLpar(TokenType type) { | ||
| if (PeekMatchLpar(type)) { |
There was a problem hiding this comment.
Why not a call to check if PeekAfter() == type?
There was a problem hiding this comment.
That's what PeekMatchLpar does, but it also checks that Peek() is TokenType::Lpar
| Result WastParser::ParseQuotedText(std::string* text) { | ||
| WABT_TRACE(ParseQuotedText); | ||
| if (!PeekMatch(TokenType::Text)) | ||
| return ErrorExpected({"a quoted string"}, "\"foo\""); |
There was a problem hiding this comment.
This is just for example text. Maybe not needed, but I thought it made the errors a little nicer:
error: unexpected token "(", expected a quoted string (e.g. "foo").
| return GetToken().loc; | ||
| } | ||
|
|
||
| TokenType WastParser::Peek() { |
There was a problem hiding this comment.
Why doesn't this check if there available tokens?
I assume this is because you are assuming that multiple lookahead only happens in very restricted contexts (such as "(" "token"). If you are making this assumption, make it more clear.
Another choice would be to replace Peek() and PeekAfter() with:
TokenType WasmParser::Peek(size_t n = 0) {
while (tokens_.size() <= n)
tokens_.push_back(lexer_->GetToken(this);
return tokens_.at(n).,token_type;
}
This solution would allow more general peek behavior.
There was a problem hiding this comment.
I thought about this, but since we only need two tokens of lookahead, I thought it would be better to make that clear by not making it general. Guess I should add some more documentation about all this.
There was a problem hiding this comment.
While I agree that you only need two tokens of look-ahead, the current implementation was making some larger assumptions, such as you are always assuming that at least one token of look-ahead is being maintained (my solution removes this restriction).
I prefer a solution that doesn't have implicit assumptions because they are ALWAYS harder to maintain. In most contexts, an extension will come along that violates it, and then it is much harder to convince yourself what places need to be fixed. I understand that the Peek(0) (or simply Peek()) that I suggest does add an additional test (to see if a token needs to be retrieved), but I don't see that as a major performance cost.
I also like only having one "Peek" method, but that is my feeling.
There was a problem hiding this comment.
OK, I'll give it a shot :-)
| return Result::Error; \ | ||
| } while (0) | ||
|
|
||
| #define EXPECT(token_type) CHECK_RESULT(Expect(TokenType::token_type)) |
There was a problem hiding this comment.
Why not make this a function and let the compiler inline it?
There was a problem hiding this comment.
Similarly, create a "check()" function
void check(Token_type ty) { CHECK_RESULT(expr); }
There was a problem hiding this comment.
and
void ExpectCheck(TokenType ty) { CHECK_RESULT(Expect(ty)); }
There was a problem hiding this comment.
I made this a macro mostly because of the magic automatic control flow (since CHECK_RESULT will return on failure)
There was a problem hiding this comment.
Good point. I didn't read it carefully enough. current form is acceptable.
* Add TokenTypePair for convenience when dealing with two tokens * Some additional function documentation too
LITERAL, RETURN => RETURN_LITERAL)