LMPTHW ex33: Parser.expression() returns Production and/or Tokens

I’m doing my own implementation of the Parser.expression() grammar from ex33 of LMPTHW.

I’m noticing that, unlike all the other grammars which only return Productions, the Parser.expression() grammar may return either a single Token (of NAME or INTEGER type) or an Expression Production. This appears to depend on whether or not the expression contains an operator (ie PLUS).

Understandably, the simplest expression is a single variable or number, but it might be confusing later when we have to analyze both Token and Production objects in the Parser’s output. Wouldn’t it be better to come up with an Expression Production class for consistency? I’ve never built an analyzer, so I don’t know what makes more sense. I’m just thinking ahead.

Here is the sample code from the book:

def expression(tokens):
    """expression = name / plus / integer"""
    start = peek(tokens)
    
    if start == 'NAME':
        name = match(tokens, 'NAME')
        if peek(tokens) == 'PLUS':
            return plus(tokens, name)
        else:
            return name
    elif start == 'INTEGER':
        number = match(tokens, 'INTEGER')
        if peek(tokens) == 'PLUS':
            return plus(tokens, number)
        else:
            return number
    else:
        assert False, "Syntax error %r" % start

I should mention that the Parser.plus() grammar returns a Production instance.

In the Parser none of the functions should return tokens as their job is crafting the right Production. One thing I see here is you’ve got a module function rather than a class. If you have a class it’s a bit easier to work this as you can keep calling each function and then keep the state in self.tokens.

But, I think the best clue for you is this:

Expression is recursive, in that it matches one side of an expression, and then calls itself for the other side. So if you have x = y + 100 then you you call expression on y + 100, that matches the y, the +, which tells you that the right side needs to be handled. Well that can be another expression so just do expression(100) and you’re done. This can then handle complex things like y + (x - j * 100) when you get more advanced.

Then, I’d assert that expression can never return a token as that’d be a bug. Also, capitalize your classes.