Thinking out loud


#121

What’s missing is the edge case. Before step 1, there should be handling of E as a unary operation/expression.


#122

I got it working! No parentheses or exponents yet, but the algorithm for the expression parser is fully operational.
Here’s the sample output. I’ll post the essential part of the code too.

Expression: A * B - C + D / E
<StatementList>[
    <Statement>
        <Expression>
            <Expression>
                <Dif>
                    left_expr:
                    <Expression>
                        <Pro>
                            left_expr:
                            <Variable>id:A</Variable>
                            right_expr:
                            <Variable>id:B</Variable>
                        </Pro>
                    </Expression>
                    right_expr:
                    <Expression>
                        <Sum>
                            left_expr:
                            <Variable>id:C</Variable>
                            right_expr:
                            <Expression>
                                <Quo>
                                    left_expr:
                                    <Variable>id:D</Variable>
                                    right_expr:
                                    <Variable>id:E</Variable>
                                </Quo>
                            </Expression>
                        </Sum>
                    </Expression>
                </Dif>
            </Expression>
        </Expression>
    </Statement>]
</StatementList>

#123

The core of the algorithm is under the #RECURSION comment. the previous and current operators are compared for precedence. If the previous operator has precedence, then the current operand is grouped with the previous operator.

The idea of “inside” and “outside” expressions reflects the nesting in my diagrams.
build_op_node(left_operand, operator, right_operand) eventually returns a production for the operator in question. The recursive structure of the expression() function makes it so the operator node is built in a tree-like form (ie it contains higher-precedence expressions) before it is returned to the caller.

    def expression(self, previous_operation=None):
        """ expression ::= parentheses | multiplication | addition | assignment
        | negation | exponent | number | variable """
        # print(f">>>EXPRESSION: prevop:{previous_operation}, line:{self.current_line}")
        # grab the current operand production/node
        current_operand = self.get_operand()   
        print(">>> end_of_expression?", self.end_of_expression())
        if previous_operation: 
            previous_operand, previous_operator = previous_operation
        # EDGES
            if self.end_of_expression():
                innards = (previous_operand, previous_operator, current_operand)
                inner_expr = self.build_op_node(innards)
                return productions.Expression(inner_expr)
        elif self.end_of_expression():
            return productions.Expression(current_operand)
        else:
            current_operator = self.peek()
            self.match(current_operator)
            outtards = (current_operand, current_operator)
            outer_expr = self.expression(outtards)
            return productions.Expression(outer_expr)
        # RECURSION
        current_operator = self.peek()
        self.match(current_operator)
        # determine operator nesting according to operator precedence
        # higher precedence == deeper nesting
        print("PREV:", previous_operator, " CURR:", current_operator)
        if self.has_precedence(previous_operator, current_operator):
            print(f"{previous_operator} HAS PRECEDENCE")
            innards = (previous_operand, previous_operator, current_operand)
            inside_expr = productions.Expression(self.build_op_node(innards))
            outtards = (inside_expr, current_operator)
            return self.expression(outtards)
        else:
            print(f"{current_operator} HAS PRECEDENCE")
            innards = (current_operand, current_operator)
            inside_expr = self.expression(innards)
            outtards = (previous_operand, previous_operator, inside_expr)
            return productions.Expression(self.build_op_node(outtards))

#124

Parentheses are working with no problems. The ‘doubled expression’ nesting is a little weird. It doesn’t break anything, but I might try to fix it anyhow. I’ll see what I can do before I tackle conditional statements.

<StatementList>[
    <Statement>
        <Expression>
            <Expression>
                <Pro>
                    left_expr:
                    <Variable>id:A</Variable>
                    right_expr:
                    <Expression>
                        <Dif>
                            left_expr:
                            <Expression>
                                <Quo>
                                    left_expr:
                                    <Parentheses>
                                        <Expression>
                                            <Expression>
                                                <Sum>
                                                    left_expr:
                                                    <Variable>id:B</Variable>
                                                    right_expr:
                                                    <Variable>id:C</Variable>
                                                </Sum>
                                            </Expression>
                                        </Expression>
                                    </Parentheses>
                                    right_expr:
                                    <Variable>id:D</Variable>
                                </Quo>
                            </Expression>
                            right_expr:
                            <Variable>id:E</Variable>
                        </Dif>
                    </Expression>
                </Pro>
            </Expression>
        </Expression>
    </Statement>]
</StatementList>

#125

Here’s my GitLab folder for LMPTHW ex36, bc. You are welcome to throw garbage at this script to see how it breaks. I would do that myself, but I’d rather get to parsing conditions and loops ASAP before I start the QC process.

https://gitlab.com/lrnjrnl/lmpthw_solutions/tree/master/ex36


#126

The Expression nesting is fixed. I had to make it so the Expression production only wrapped outer expressions. Inner expressions. The exception is in the parentheses. They hold self-contained Expressions too.

<StatementList>[
    <Statement>
        <Expression>
            <Pro>
                left_expr:
                <Variable>id:A</Variable>
                right_expr:
                <Dif>
                    left_expr:
                    <Quo>
                        left_expr:
                        <Parentheses>
                            <Expression>
                                <Sum>
                                    left_expr:
                                    <Variable>id:B</Variable>
                                    right_expr:
                                    <Variable>id:C</Variable>
                                </Sum>
                            </Expression>
                        </Parentheses>
                        right_expr:
                        <Variable>id:D</Variable>
                    </Quo>
                    right_expr:
                    <Variable>id:E</Variable>
                </Dif>
            </Pro>
        </Expression>
    </Statement>]
</StatementList>

#127

Haha. I just noticed that there’s a problem with the nesting of this expression:

A * (B + C) / D - E

The <Pro> should be nested in the <Dif>.
Something broke in the operator precedence. I have a feeling it’s a problem with the edge cases.


#128

I think I’ve got it. I had to overhaul the expression parser. Here’s the code for the new expression parser:

    def expression(self):
        """ expression ::= parentheses | multiplication | addition | assignment
        | negation | exponent | number | variable """
        expression = self.expr_recurse()
        return productions.Expression(expression)

    def expr_recurse(self, prev_term=None):
        # grab the current operand production/node
        if prev_term:
            LTm = prev_term
        else:
            LTm = self.get_term()

        if self.end_of_expression():
            return LTm
        else:
            LOp = self.get_op()
            RTm = self.get_term()

        while not self.end_of_expression():
            ROp = self.peek()
            if self.has_precedence(LOp, ROp):
                # left operator has precedence
                LTm = self.op_node(LTm, LOp, RTm)
                if prev_term:
                    return LTm
                else:
                    LOp = self.get_op()
                    RTm = self.get_term()
            else:
                # right operator has precedence, so
                # send right term into recursion to 
                # parse right side as larger expression
                RTm = self.expr_recurse(RTm)

        return self.op_node(LTm, LOp, RTm)

Here’s the output for expression, A * (B + C) / D - E

    <Statement>
        <Expression>
            <Dif>
                left_expr:
                <Pro>
                    left_expr:
                    <Variable>id:A</Variable>
                    right_expr:
                    <Quo>
                        left_expr:
                        <Parentheses>
                            <Expression>
                                <Sum>
                                    left_expr:
                                    <Variable>id:B</Variable>
                                    right_expr:
                                    <Variable>id:C</Variable>
                                </Sum>
                            </Expression>
                        </Parentheses>
                        right_expr:
                        <Variable>id:D</Variable>
                    </Quo>
                </Pro>
                right_expr:
                <Variable>id:E</Variable>
            </Dif>
        </Expression>
    </Statement>]
</StatementList>

I’ll post a link to my GitHub soon.


#129

This is the GitLab page… I hope I set the permissions correctly! I was searching for a long time just to find out if it is possible to set the repo visibility to public, read-only. The language on the configuration page is confusing.

Anyhow, the new expression parser is working. Leading up to this point, I gave up trying to revise it directly in vim and instead opted for pseudocode in my math notebook. It’s about five pages of scribbles, doubts and abandoned ideas. I’ll take pictures and post it from my phone in case anyone wonders what the process was like.

Once I had some realistic pseudocode, I started typing.
A few edits later and I had a working expression parser! I would appreciate if anyone could edit the test_parser.py to put it some other arithmetic expressions. Right now, the parser only supports the following operators:

^
/
*
+
-
and parentheses 
()

#130

I think there’s something to be said for stepping away from the keyboard to code with pen and paper. It worked for me this time.


#131

And here I thought I could implement an expression parser without having to study volumes of material. Was I ever wrong! :stuck_out_tongue:

I’m hoping to come up with a way of making small modifications to my parser so it will support operator associativity… like a function that detects right-associative operators and starts a separate expression parse routine until it can return to the original expression.


#132

Here we go again! Who knew they have names for what I’ve been working on?! I’m glad I tried to figure out expression parsing on my own first. There’s still a lot to digest, but the hands-on experience makes this reading a little less mind-bending.


Despite all this theory, I think the solution to my problem is to simply modify the ‘assignment’ grammar (and any of the assignment operation grammars too) so that the grammar itself invokes the expression parser for the right side operand/expression rather than leaving that to its caller.


#133

The Simple precedence parser seems to be very similar to my expression parser, except that they’re using a stack. I haven’t yet looked deeply into the stack, but my sense tells me that it is a less abstract representation of the parse than I get with OOP. Namely, the PIVOT, SHIFT and REDUCE operations seem to be more directly linear or ‘stacky’ than the node-based model I’m using to build the parse tree. Then again, maybe they’re not so different because my implementation seems to be performing a SHIFT (‘node’ the right term with the left) and/or a REDUCE (parse the right term to enclose more of the expression ahead) of each term/operand before connecting another operator-node onto the tree.

Now, here comes the math. Prepare to be psychedelicized.

I’m approaching the deep end of the pool. Without a floatie. If you don’t hear back from me in a couple of days, call the coast guard. :ship: :wink:

Anyhow, I think PIVOT might have a connection to operator associativity. I will study it for a bit.


#134

This is a neat guide to expression parsing. No math. Just pseudo-code and some tables.
https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm

It demonstrates some of the weirdness you can achieve with assignment in bc - assignment gets evaluated right-to left. The variable to which the RHS is assigned then goes on to be incorporated into the operation preceding the assignment!
eg.
3 + x = 7 - 3
what? integer and an operator on the left side?? Yes sir.
here’s how it’s evaluated
2 + (x = (7 -3))
that is,
7 - 3 = 4
x = 4
2 + x = 2 + 4
6


#135

Heyoooo!!! I figured out an easy way to implement assignment right-associativity without breaking operator precedence rules!! I still have to do more testing with it, but here’s some pseudo-code to go in the expression parser’s while loop:

IF the Left Operator has precedence AND doesn't belong to the set of assignment operators
  'nodify' the left side into (LTerm, LOp, RTerm)
ELSE
  IF the ROp is an assignment operator
    pop the ROp token to expose the first term of remaining expression
    get RHS value by scooping up the remaining unparsed expression as RHS (right-hand side) of ROp
    'nodify' RTm into (RTm, ROp, RHS)
  ELSE the ROp is a non-assignment operator
    RTm is joined with and scoops up the remaining unparsed expression tokens

Here’s a snippet of the actual code

def expr_recurse(self, prev_term=None):
        # grab the current operand production/node
        if prev_term:
            LTm = prev_term
        else:
            LTm = self.get_term()
        if self.end_of_expression():
            return LTm
        else:
            LOp = self.get_op()
            RTm = self.get_term()

        while not self.end_of_expression():
            ROp = self.peek()
            if self.has_precedence(LOp, ROp) and ROp not in self.assigns:
                # left operator has precedence
                LTm = self.op_node(LTm, LOp, RTm)
                if prev_term:
                    return LTm
                else:
                    LOp = self.get_op()
                    RTm = self.get_term()
            else:
                # right operator has precedence, so
                # send right term into recursion to 
                # parse right side as larger expression
                if ROp in self.assigns:
                    # assigns evaluate RHS first as separate expression
                    # because of right-associativity
                    ROp = self.get_op()
                    RHS_assign = self.expr_recurse()
                    RTm = self.op_node(RTm, ROp, RHS_assign)
                else:
                    RTm = self.expr_recurse(RTm)

        return self.op_node(LTm, LOp, RTm)

And some test output for the expression A + B = C += D / E

<StatementList>[
    <Statement>
        <Expression>
            <Sum>
                left_expr:
                <Variable>id:A</Variable>
                right_expr:
                <Assign>
                    left_expr:
                    <Variable>id:B</Variable>
                    right_expr:
                    <SumAsgn>
                        left_expr:
                        <Variable>id:C</Variable>
                        right_expr:
                        <Quo>
                            left_expr:
                            <Variable>id:D</Variable>
                            right_expr:
                            <Variable>id:E</Variable>
                        </Quo>
                    </SumAsgn>
                </Assign>
            </Sum>
        </Expression>
    </Statement>]
</StatementList>

I’m hoping this will work with some more demanding tests so that I can put the parser aside for a little while to get the interpreter running, and then come back to the parser to handle conditionals. Yeha! :smiley:


#136

Also, if this thing really works, I think it deserves more thorough code commenting. It’d be great to use it for a blog post. And just to think… I didn’t have to read volumes on precedence and associativity after all! I’m sure there’s a good reason the academic explanations are so dry and confusing, but you don’t have to really know that stuff to solve parsing. Logic and intuition can go a long way.


#137

This beautifier tutorial looks very helpful. I think it could streamline my parser testing process. I’m guessing its the pygments module which makes a beautifier easy to write.


#138

I’ve got if, while and for statements parsing. Next I’ll need to parse relational expressions.

here’s the test input

x = 9
if (x) while(x) x -= 1 else for(y=5;y;y-=1) z+=3

and the test output

<StatementList>[
    <Statement>
        <Expression>
            <Assign>
                left_expr:
                <Variable>id:x</Variable>
                right_expr:
                <Number>value:9</Number>
            </Assign>
        </Expression>
    </Statement>]
</StatementList>

<StatementList>[
    <IfStatement>
        test:
        <Expression>
            <Variable>id:x</Variable>
        </Expression>
        action_true:
        <WhileStatement>
            test:
            <Expression>
                <Variable>id:x</Variable>
            </Expression>
            action:
            <Statement>
                <Expression>
                    <DifAsgn>
                        left_expr:
                        <Variable>id:x</Variable>
                        right_expr:
                        <Number>value:1</Number>
                    </DifAsgn>
                </Expression>
            </Statement>
        </WhileStatement>
        action_false:
        <ForStatement>
            start_expr:
            <Expression>
                <Assign>
                    left_expr:
                    <Variable>id:y</Variable>
                    right_expr:
                    <Number>value:5</Number>
                </Assign>
            </Expression>
            test:
            <Expression>
                <Variable>id:y</Variable>
            </Expression>
            itr_expr:
            <Expression>
                <DifAsgn>
                    left_expr:
                    <Variable>id:y</Variable>
                    right_expr:
                    <Number>value:1</Number>
                </DifAsgn>
            </Expression>
            action:
            <Statement>
                <Expression>
                    <SumAsgn>
                        left_expr:
                        <Variable>id:z</Variable>
                        right_expr:
                        <Number>value:3</Number>
                    </SumAsgn>
                </Expression>
            </Statement>
        </ForStatement>
    </IfStatement>]
</StatementList>

#139

These are great man. You really should get into language design when you go into school.


#140

BNF for operator precedence explained


This video is amazing. It explains what I should have known BEFORE I wrote my expression parser. That is, the operator precedence should come from the grammar hierarchy itself rather than a precedence table! It makes a lot of sense. I think the main challenge with it is that I’ll need to come up with names for each level of the hierarchy, of which there could be several.

However, unless implementing relational operators is impossible without this technique, I don’t think I’m going to go back and rewrite my parser just to follow this structure. I’m not using a parser-generator. So I don’t think I’ll have any problems with my current approach.

I will definitely try writing my expression grammars this way on my next project, Zed’s Little BASIC.