What’s missing is the edge case. Before step 1, there should be handling of E as a unary operation/expression.
Thinking out loud
I got it working! No parentheses or exponents yet, but the algorithm for the expression parser is fully operational.
Here’s the sample output. I’ll post the essential part of the code too.
Expression: A * B  C + D / E
<StatementList>[
<Statement>
<Expression>
<Expression>
<Dif>
left_expr:
<Expression>
<Pro>
left_expr:
<Variable>id:A</Variable>
right_expr:
<Variable>id:B</Variable>
</Pro>
</Expression>
right_expr:
<Expression>
<Sum>
left_expr:
<Variable>id:C</Variable>
right_expr:
<Expression>
<Quo>
left_expr:
<Variable>id:D</Variable>
right_expr:
<Variable>id:E</Variable>
</Quo>
</Expression>
</Sum>
</Expression>
</Dif>
</Expression>
</Expression>
</Statement>]
</StatementList>
The core of the algorithm is under the #RECURSION comment. the previous and current operators are compared for precedence. If the previous operator has precedence, then the current operand is grouped with the previous operator.
The idea of “inside” and “outside” expressions reflects the nesting in my diagrams.
build_op_node(left_operand, operator, right_operand)
eventually returns a production for the operator in question. The recursive structure of the expression() function makes it so the operator node is built in a treelike form (ie it contains higherprecedence expressions) before it is returned to the caller.
def expression(self, previous_operation=None):
""" expression ::= parentheses  multiplication  addition  assignment
 negation  exponent  number  variable """
# print(f">>>EXPRESSION: prevop:{previous_operation}, line:{self.current_line}")
# grab the current operand production/node
current_operand = self.get_operand()
print(">>> end_of_expression?", self.end_of_expression())
if previous_operation:
previous_operand, previous_operator = previous_operation
# EDGES
if self.end_of_expression():
innards = (previous_operand, previous_operator, current_operand)
inner_expr = self.build_op_node(innards)
return productions.Expression(inner_expr)
elif self.end_of_expression():
return productions.Expression(current_operand)
else:
current_operator = self.peek()
self.match(current_operator)
outtards = (current_operand, current_operator)
outer_expr = self.expression(outtards)
return productions.Expression(outer_expr)
# RECURSION
current_operator = self.peek()
self.match(current_operator)
# determine operator nesting according to operator precedence
# higher precedence == deeper nesting
print("PREV:", previous_operator, " CURR:", current_operator)
if self.has_precedence(previous_operator, current_operator):
print(f"{previous_operator} HAS PRECEDENCE")
innards = (previous_operand, previous_operator, current_operand)
inside_expr = productions.Expression(self.build_op_node(innards))
outtards = (inside_expr, current_operator)
return self.expression(outtards)
else:
print(f"{current_operator} HAS PRECEDENCE")
innards = (current_operand, current_operator)
inside_expr = self.expression(innards)
outtards = (previous_operand, previous_operator, inside_expr)
return productions.Expression(self.build_op_node(outtards))
Parentheses are working with no problems. The ‘doubled expression’ nesting is a little weird. It doesn’t break anything, but I might try to fix it anyhow. I’ll see what I can do before I tackle conditional statements.
<StatementList>[
<Statement>
<Expression>
<Expression>
<Pro>
left_expr:
<Variable>id:A</Variable>
right_expr:
<Expression>
<Dif>
left_expr:
<Expression>
<Quo>
left_expr:
<Parentheses>
<Expression>
<Expression>
<Sum>
left_expr:
<Variable>id:B</Variable>
right_expr:
<Variable>id:C</Variable>
</Sum>
</Expression>
</Expression>
</Parentheses>
right_expr:
<Variable>id:D</Variable>
</Quo>
</Expression>
right_expr:
<Variable>id:E</Variable>
</Dif>
</Expression>
</Pro>
</Expression>
</Expression>
</Statement>]
</StatementList>
Here’s my GitLab folder for LMPTHW ex36, bc. You are welcome to throw garbage at this script to see how it breaks. I would do that myself, but I’d rather get to parsing conditions and loops ASAP before I start the QC process.
https://gitlab.com/lrnjrnl/lmpthw_solutions/tree/master/ex36
The Expression nesting is fixed. I had to make it so the Expression production only wrapped outer expressions. Inner expressions. The exception is in the parentheses. They hold selfcontained Expressions too.
<StatementList>[
<Statement>
<Expression>
<Pro>
left_expr:
<Variable>id:A</Variable>
right_expr:
<Dif>
left_expr:
<Quo>
left_expr:
<Parentheses>
<Expression>
<Sum>
left_expr:
<Variable>id:B</Variable>
right_expr:
<Variable>id:C</Variable>
</Sum>
</Expression>
</Parentheses>
right_expr:
<Variable>id:D</Variable>
</Quo>
right_expr:
<Variable>id:E</Variable>
</Dif>
</Pro>
</Expression>
</Statement>]
</StatementList>
Haha. I just noticed that there’s a problem with the nesting of this expression:
A * (B + C) / D  E
The <Pro>
should be nested in the <Dif>
.
Something broke in the operator precedence. I have a feeling it’s a problem with the edge cases.
I think I’ve got it. I had to overhaul the expression parser. Here’s the code for the new expression parser:
def expression(self):
""" expression ::= parentheses  multiplication  addition  assignment
 negation  exponent  number  variable """
expression = self.expr_recurse()
return productions.Expression(expression)
def expr_recurse(self, prev_term=None):
# grab the current operand production/node
if prev_term:
LTm = prev_term
else:
LTm = self.get_term()
if self.end_of_expression():
return LTm
else:
LOp = self.get_op()
RTm = self.get_term()
while not self.end_of_expression():
ROp = self.peek()
if self.has_precedence(LOp, ROp):
# left operator has precedence
LTm = self.op_node(LTm, LOp, RTm)
if prev_term:
return LTm
else:
LOp = self.get_op()
RTm = self.get_term()
else:
# right operator has precedence, so
# send right term into recursion to
# parse right side as larger expression
RTm = self.expr_recurse(RTm)
return self.op_node(LTm, LOp, RTm)
Here’s the output for expression, A * (B + C) / D  E
<Statement>
<Expression>
<Dif>
left_expr:
<Pro>
left_expr:
<Variable>id:A</Variable>
right_expr:
<Quo>
left_expr:
<Parentheses>
<Expression>
<Sum>
left_expr:
<Variable>id:B</Variable>
right_expr:
<Variable>id:C</Variable>
</Sum>
</Expression>
</Parentheses>
right_expr:
<Variable>id:D</Variable>
</Quo>
</Pro>
right_expr:
<Variable>id:E</Variable>
</Dif>
</Expression>
</Statement>]
</StatementList>
I’ll post a link to my GitHub soon.
This is the GitLab page… I hope I set the permissions correctly! I was searching for a long time just to find out if it is possible to set the repo visibility to public, readonly. The language on the configuration page is confusing.
Anyhow, the new expression parser is working. Leading up to this point, I gave up trying to revise it directly in vim and instead opted for pseudocode in my math notebook. It’s about five pages of scribbles, doubts and abandoned ideas. I’ll take pictures and post it from my phone in case anyone wonders what the process was like.
Once I had some realistic pseudocode, I started typing.
A few edits later and I had a working expression parser! I would appreciate if anyone could edit the test_parser.py to put it some other arithmetic expressions. Right now, the parser only supports the following operators:
^
/
*
+

and parentheses
()
I think there’s something to be said for stepping away from the keyboard to code with pen and paper. It worked for me this time.
And here I thought I could implement an expression parser without having to study volumes of material. Was I ever wrong!
I’m hoping to come up with a way of making small modifications to my parser so it will support operator associativity… like a function that detects rightassociative operators and starts a separate expression parse routine until it can return to the original expression.
Here we go again! Who knew they have names for what I’ve been working on?! I’m glad I tried to figure out expression parsing on my own first. There’s still a lot to digest, but the handson experience makes this reading a little less mindbending.
Despite all this theory, I think the solution to my problem is to simply modify the ‘assignment’ grammar (and any of the assignment operation grammars too) so that the grammar itself invokes the expression parser for the right side operand/expression rather than leaving that to its caller.
The Simple precedence parser seems to be very similar to my expression parser, except that they’re using a stack. I haven’t yet looked deeply into the stack, but my sense tells me that it is a less abstract representation of the parse than I get with OOP. Namely, the PIVOT, SHIFT and REDUCE operations seem to be more directly linear or ‘stacky’ than the nodebased model I’m using to build the parse tree. Then again, maybe they’re not so different because my implementation seems to be performing a SHIFT (‘node’ the right term with the left) and/or a REDUCE (parse the right term to enclose more of the expression ahead) of each term/operand before connecting another operatornode onto the tree.
Now, here comes the math. Prepare to be psychedelicized.
I’m approaching the deep end of the pool. Without a floatie. If you don’t hear back from me in a couple of days, call the coast guard.
Anyhow, I think PIVOT might have a connection to operator associativity. I will study it for a bit.
This is a neat guide to expression parsing. No math. Just pseudocode and some tables.
https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm
It demonstrates some of the weirdness you can achieve with assignment in bc  assignment gets evaluated rightto left. The variable to which the RHS is assigned then goes on to be incorporated into the operation preceding the assignment!
eg.
3 + x = 7  3
what? integer and an operator on the left side?? Yes sir.
here’s how it’s evaluated
2 + (x = (7 3))
that is,
7  3 = 4
x = 4
2 + x = 2 + 4
6
Heyoooo!!! I figured out an easy way to implement assignment rightassociativity without breaking operator precedence rules!! I still have to do more testing with it, but here’s some pseudocode to go in the expression parser’s while loop:
IF the Left Operator has precedence AND doesn't belong to the set of assignment operators
'nodify' the left side into (LTerm, LOp, RTerm)
ELSE
IF the ROp is an assignment operator
pop the ROp token to expose the first term of remaining expression
get RHS value by scooping up the remaining unparsed expression as RHS (righthand side) of ROp
'nodify' RTm into (RTm, ROp, RHS)
ELSE the ROp is a nonassignment operator
RTm is joined with and scoops up the remaining unparsed expression tokens
Here’s a snippet of the actual code
def expr_recurse(self, prev_term=None):
# grab the current operand production/node
if prev_term:
LTm = prev_term
else:
LTm = self.get_term()
if self.end_of_expression():
return LTm
else:
LOp = self.get_op()
RTm = self.get_term()
while not self.end_of_expression():
ROp = self.peek()
if self.has_precedence(LOp, ROp) and ROp not in self.assigns:
# left operator has precedence
LTm = self.op_node(LTm, LOp, RTm)
if prev_term:
return LTm
else:
LOp = self.get_op()
RTm = self.get_term()
else:
# right operator has precedence, so
# send right term into recursion to
# parse right side as larger expression
if ROp in self.assigns:
# assigns evaluate RHS first as separate expression
# because of rightassociativity
ROp = self.get_op()
RHS_assign = self.expr_recurse()
RTm = self.op_node(RTm, ROp, RHS_assign)
else:
RTm = self.expr_recurse(RTm)
return self.op_node(LTm, LOp, RTm)
And some test output for the expression A + B = C += D / E
<StatementList>[
<Statement>
<Expression>
<Sum>
left_expr:
<Variable>id:A</Variable>
right_expr:
<Assign>
left_expr:
<Variable>id:B</Variable>
right_expr:
<SumAsgn>
left_expr:
<Variable>id:C</Variable>
right_expr:
<Quo>
left_expr:
<Variable>id:D</Variable>
right_expr:
<Variable>id:E</Variable>
</Quo>
</SumAsgn>
</Assign>
</Sum>
</Expression>
</Statement>]
</StatementList>
I’m hoping this will work with some more demanding tests so that I can put the parser aside for a little while to get the interpreter running, and then come back to the parser to handle conditionals. Yeha!
Also, if this thing really works, I think it deserves more thorough code commenting. It’d be great to use it for a blog post. And just to think… I didn’t have to read volumes on precedence and associativity after all! I’m sure there’s a good reason the academic explanations are so dry and confusing, but you don’t have to really know that stuff to solve parsing. Logic and intuition can go a long way.
This beautifier tutorial looks very helpful. I think it could streamline my parser testing process. I’m guessing its the pygments module which makes a beautifier easy to write.
I’ve got if, while and for statements parsing. Next I’ll need to parse relational expressions.
here’s the test input
x = 9
if (x) while(x) x = 1 else for(y=5;y;y=1) z+=3
and the test output
<StatementList>[
<Statement>
<Expression>
<Assign>
left_expr:
<Variable>id:x</Variable>
right_expr:
<Number>value:9</Number>
</Assign>
</Expression>
</Statement>]
</StatementList>
<StatementList>[
<IfStatement>
test:
<Expression>
<Variable>id:x</Variable>
</Expression>
action_true:
<WhileStatement>
test:
<Expression>
<Variable>id:x</Variable>
</Expression>
action:
<Statement>
<Expression>
<DifAsgn>
left_expr:
<Variable>id:x</Variable>
right_expr:
<Number>value:1</Number>
</DifAsgn>
</Expression>
</Statement>
</WhileStatement>
action_false:
<ForStatement>
start_expr:
<Expression>
<Assign>
left_expr:
<Variable>id:y</Variable>
right_expr:
<Number>value:5</Number>
</Assign>
</Expression>
test:
<Expression>
<Variable>id:y</Variable>
</Expression>
itr_expr:
<Expression>
<DifAsgn>
left_expr:
<Variable>id:y</Variable>
right_expr:
<Number>value:1</Number>
</DifAsgn>
</Expression>
action:
<Statement>
<Expression>
<SumAsgn>
left_expr:
<Variable>id:z</Variable>
right_expr:
<Number>value:3</Number>
</SumAsgn>
</Expression>
</Statement>
</ForStatement>
</IfStatement>]
</StatementList>
These are great man. You really should get into language design when you go into school.
BNF for operator precedence explained
This video is amazing. It explains what I should have known BEFORE I wrote my expression parser. That is, the operator precedence should come from the grammar hierarchy itself rather than a precedence table! It makes a lot of sense. I think the main challenge with it is that I’ll need to come up with names for each level of the hierarchy, of which there could be several.
However, unless implementing relational operators is impossible without this technique, I don’t think I’m going to go back and rewrite my parser just to follow this structure. I’m not using a parsergenerator. So I don’t think I’ll have any problems with my current approach.
I will definitely try writing my expression grammars this way on my next project, Zed’s Little BASIC.