Thinking out loud

Thinking ahead to assignment operator…

Right now, the expression parsing is something like

  • if current token is variable, integer or function call, is next token an operator?
  • then call the operator’s grammar.
  • else end the expression

This is where I need to get the PLUS operator’s grammar to check whether or not there is an ASSIGN operator nested higher up (to the left in the statement). Eg, the nesting of the expression:

a + b + c + d

… would look like (d + (c + (b + a))) in the syntax tree. This means that (d + whatever) contains (c + whatever) which contains (a + b). … eep. I might have this backward, but I’m pretty sure my recursion goes from right to left. Reading the sample code, the left operand ‘a’ is 'bundled with the right, ‘b’, and then nested within the next left operand (a + b) against the next right operand, ‘c’, etc.

Now let me look at variations on this nesting structure using the assignment operator.

a = b + c + d

Here’s the nesting structure I’ll need (a = (d + (c + b))) for that one.
These next two should result in syntax errors.

a + b = c + d
a + b + c = d

Now I need to figure out an algorithm to nest these statements so that the error is reliably detected.
First example nesting

(((a + b) = c) + d)

Basically, the case for the assignment operator, ‘=’, should check the object/production type of the left side of the expression. If the production is an arithmetic (ie ‘+’, ‘*’, ‘-’ or ‘/’), the parser’s expression grammar should raise a syntax error. I think this error should be detected three expression calls deep: single expr, plus, assignment

Second example nesting

(((a + b) + c) = d)

This is similar to the above, but it should take four expression calls to detect the error: single expr, plus, plus, assignment
That is how I will begin solving the assignment operator.

I did some reading in the official docs.

https://docs.python.org/3/reference/expressions.html

Assignment is not an expression. It’s an assignment statement. That means I may need to substantially rewrite my body parsing grammar to be rewritten from

body = *(func_def | expr)

to

body = *(func_def | stmt)

The ‘statement’ grammar will be responsible for parsing assignment and containing expressions.

You are really knocking this out of the park when it comes to parsing. Remember though that you are supposed to do a few tiny languages and kind of move on. There is a point where your handwritten scanner and parser just can’t handle what you’re working on. At that point, it’s best to use a parser and scanner generator.

1 Like

I got the statement grammar and production working along with variable assignment! Here’s the output (I added some assignment statements to the sample code). Indentation courtesy of an HTML beautifier I found on DuckDuckGo.

def hello(x, y):
    print(x + y)

hello(1, 2)
hello(3, 4)
a = 5
b = 6
c = a + b
hello(a, b)

[
<Root indt:0>[
    <FuncDef name:hello>[
        <FArg name: 'x'></FArg>,
        <FArg name: 'y'></FArg>]

        <Body indt:1>[
            <Statement>
                <FuncCall print>[
                    <ExprPlus>
                        <ExprID name: 'x'></ExprID>
                        <ExprID name: 'y'></ExprID>
                    </ExprPlus>]
                </FuncCall>None
            </Statement>]
        </Body indt:1>
    </FuncDef>,
    <Statement>
        <FuncCall hello>[
            <ExprInt val: '1'></ExprInt>,
            <ExprInt val: '2'></ExprInt>]
        </FuncCall>None
    </Statement>,
    <Statement>
        <FuncCall hello>[
            <ExprInt val: '3'></ExprInt>,
            <ExprInt val: '4'></ExprInt>]
        </FuncCall>None
    </Statement>,
    <Statement>None
        <AssignID name: 'a'</AssignID>
    </Statement>,
    <Statement>
        <ExprInt val: '5'></ExprInt>None
    </Statement>,
    <Statement>None
        <AssignID name: 'b'</AssignID>
    </Statement>,
    <Statement>
        <ExprInt val: '6'></ExprInt>None
    </Statement>,
    <Statement>None
        <AssignID name: 'c'</AssignID>
    </Statement>,
    <Statement>
        <ExprPlus>
            <ExprID name: 'a'></ExprID>
            <ExprID name: 'b'></ExprID>
        </ExprPlus>None
    </Statement>,
    <Statement>
        <FuncCall hello>[
            <ExprID name: 'a'></ExprID>,
            <ExprID name: 'b'></ExprID>]
        </FuncCall>None
    </Statement>]
</Root indt:0>]

Oh, and I discovered that my addition nesting works from left to right, not the other way around as I had guessed. Here’s how I found out!

a + b + c

 <Statement>
    <ExprPlus>
      <ExprID name: 'a'>
      </ExprID>
      <ExprPlus>
        <ExprID name: 'b'>
        </ExprID>
        <ExprID name: 'c'>
        </ExprID>
      </ExprPlus>
    </ExprPlus>None
  </Statement>

Argh… looking closely at the nesting of the other statements, I have clearly messed up. I need to keep my assignment and accompanying expression in the same statement production, not in separate productions! I will fix it now. Should be quick.

I fixed it! Here’s the new output. The weird part is that the assign variable productions are output before the expression productions in the assignment statements. I’m not sure why that is because I wrote them like this in the Statement production’s __repr__ method…

def __repr__(self):
        return f"\n<Statement>{self.assign_id}{self.expression}\n</Statement>"

Yet the output looks like this

def hello(x, y):
    print(x + y)
hello(1, 2)
hello(3, 4)
a = 5
b = 6 + 7
c = a + b
a + b + c
hello(a, b)

[
<Root indt:0>[
  <FuncDef name:hello>[
    <FArg name:'x'>
    </FArg>, 
    <FArg name:'y'>
    </FArg>]
    <Body indt:1>[
      <Statement>
        <FuncCall print>[
          <ExprPlus>
            <ExprID name:'x'>
            </ExprID>
            <ExprID name:'y'>
            </ExprID>
          </ExprPlus>]
        </FuncCall>None
      </Statement>]
    </Body indt:1>
  </FuncDef>, 
  <Statement>
    <FuncCall hello>[
      <ExprInt val:'1'>
      </ExprInt>, 
      <ExprInt val:'2'>
      </ExprInt>]
    </FuncCall>None
  </Statement>, 
  <Statement>
    <FuncCall hello>[
      <ExprInt val:'3'>
      </ExprInt>, 
      <ExprInt val:'4'>
      </ExprInt>]
    </FuncCall>None
  </Statement>, 
  <Statement>
    <ExprInt val:'5'>
    </ExprInt>
    <AssignID name:'a'>
    </AssignID>
  </Statement>, 
  <Statement>
    <ExprPlus>
      <ExprInt val:'6'>
      </ExprInt>
      <ExprInt val:'7'>
      </ExprInt>
    </ExprPlus>
    <AssignID name:'b'>
    </AssignID>
  </Statement>, 
  <Statement>
    <ExprPlus>
      <ExprID name:'a'>
      </ExprID>
      <ExprID name:'b'>
      </ExprID>
    </ExprPlus>
    <AssignID name:'c'>
    </AssignID>
  </Statement>, 
  <Statement>
    <ExprPlus>
      <ExprID name:'a'>
      </ExprID>
      <ExprPlus>
        <ExprID name:'b'>
        </ExprID>
        <ExprID name:'c'>
        </ExprID>
      </ExprPlus>
    </ExprPlus>None
  </Statement>, 
  <Statement>
    <FuncCall hello>[
      <ExprID name:'a'>
      </ExprID>, 
      <ExprID name:'b'>
      </ExprID>]
    </FuncCall>None
  </Statement>]
</Root indt:0>]

A mystery which doesn’t really matter much to me right now, but maybe it has something to do with recursion of the repr method.

Note that the ‘None’ appearing throughout the output represents an empty assignment attribute for the statement. If it says ‘None’ accompanying an expression in the statement production, that means it’s not an assignment statement, but just a statement of the expression alone.

Now I have the Root production handling the built-in function definitions. I only have one builtin. It’s called ‘BIPrint’. It has its own production class, but I may make it a subclass of a general BuiltIn class if I ever decide to add more builtins. You can see the built-in is nested directly under the root, just like the regular function definitions and statements. I made sure to add them at the beginning to ensure the function definitions don’t get an unknown function error from the analyzer when they try to call print().

def hello(x, y):
    print(x + y)
hello(1, 2)
hello(3, 4)
a = 5
b = 6 + 7
c = a + b
a + b + c
hello(a, b)
 [
<Root indt:0>[
    <BIPrint>[
        <FArg name: 'prtarg'></FArg>]
    </BIPrint>,
    <FuncDef name:hello>[
        <FArg name: 'x'></FArg>,
        <FArg name: 'y'></FArg>]

        <Body indt:1>[
            <Statement>
                <FuncCall print>[
                    <ExprPlus>
                        <ExprID name: 'x'></ExprID>
                        <ExprID name: 'y'></ExprID>
                    </ExprPlus>]
                </FuncCall>None
            </Statement>]
        </Body indt:1>
    </FuncDef>,
    <Statement>
        <FuncCall hello>[
            <ExprInt val: '1'></ExprInt>,
            <ExprInt val: '2'></ExprInt>]
        </FuncCall>None
    </Statement>,
    <Statement>
        <FuncCall hello>[
            <ExprInt val: '3'></ExprInt>,
            <ExprInt val: '4'></ExprInt>]
        </FuncCall>None
    </Statement>,
    <Statement>
        <ExprInt val: '5'></ExprInt>
        <AssignID name: 'a'></AssignID>
    </Statement>,
    <Statement>
        <ExprPlus>
            <ExprInt val: '6'></ExprInt>
            <ExprInt val: '7'></ExprInt>
        </ExprPlus>
        <AssignID name: 'b'></AssignID>
    </Statement>,
    <Statement>
        <ExprPlus>
            <ExprID name: 'a'></ExprID>
            <ExprID name: 'b'></ExprID>
        </ExprPlus>
        <AssignID name: 'c'></AssignID>
    </Statement>,
    <Statement>
        <ExprPlus>
            <ExprID name: 'a'></ExprID>
            <ExprPlus>
                <ExprID name: 'b'></ExprID>
                <ExprID name: 'c'></ExprID>
            </ExprPlus>
        </ExprPlus>None
    </Statement>,
    <Statement>
        <FuncCall hello>[
            <ExprID name: 'a'></ExprID>,
            <ExprID name: 'b'></ExprID>]
        </FuncCall>None
    </Statement>]
</Root indt:0>]

Here’s output from the analyzer test. I have functions and variables being declared within lexical scope. The World object ‘spawns’ a new instance into any function definition’s visit() method.

The parent instance keeps a reference to all children, and each child has a reference back to the parent. The reference to the parent allows for scope ‘masking’ during symbol lookup. That means if a symbol isn’t found in the local symbol table, the parent will be searched, and so on until either the symbol is found, or the root scope level is reached.

I’m still deciding whether or not the analyzer should start assigning values, or if it’s sufficient for the assignments in the AST to simply perform a lookup to verify that the symbols are declared in logical order.

def hello(x, y):
    print(x + y)

hello(1, 2)
hello(3, 4)
a = 5
b = 6 + 7
c = a + b
a + b + c
hello(a, b)

FUNCTION DECLARED:  dict_keys(['print'])
SYMBOL DECLARED:  dict_keys(['prtarg'])
FUNCTION DECLARED:  dict_keys(['print', 'hello'])
SYMBOL DECLARED:  dict_keys(['x'])
SYMBOL DECLARED:  dict_keys(['x', 'y'])
SYMBOL DECLARED:  dict_keys(['a'])
SYMBOL DECLARED:  dict_keys(['a', 'b'])
SYMBOL DECLARED:  dict_keys(['a', 'b', 'c'])

<scope>[{'a': None, 'b': None, 'c': None}, dict_keys(['print', 'hello']), [
    <scope>[{'prtarg': None}, dict_keys([]), []]
    </scope>,
    <scope>[{'x': None, 'y': None}, dict_keys([]), []]
    </scope>]]
</scope>

Revisiting lexical scoping.

https://www.quora.com/What-is-meant-by-dynamic-scoped-languages-vs-static-scoped-languages?share=1

So you asked at which point should a value be assigned to a variable? So let’s say you can do it at the analyzer or interpreter stage (I won’t bother with compilers in this).

If you do it at the analyzer (meaning, put the value in the world), then you’re basically creating static variables. This doesn’t really work for functions too much unless you attach a world to each function definition that you then merge in. In languages that are static this works fine. Think, C, C++, Java, etc.

A language like Python though is dynamic, so you actually don’t really know the value of anything until you interpret it. For speed a lot of people will do as much as they can in the analyzer step, but the simplest way is to do it when you run, and build every value on the fly as you interpret.

This really becomes sort of a defining thing about your language. If you can set values early then your stuff runs faster, but that means that it’s less dynamic. If you wait until the interpreter then it’s much easier to use but not as easy to spot errors.

Ok, so I have a question about this scenario:

greeting = input()

print(greeting)

What happens when this code is analyzed in a statically scoped language? print(greeting) can’t lookup the value of greeting in the symbol table, can it? After all, even though greeting has been declared, it has no value until input() gets called at run time. How does this not break the analysis?

Maybe that wasn’t the best scenario to illustrate the question. A better scenario might be, in a language with lexical scope, what happens to the analysis when you’re performing calculations on values that can’t be known until execution?

No, that’s a good example of why you don’t really do this in the analyzer, but I was referring more to things like this:

name = 'Zed A. Shaw'

do_stuff(name)

That is an assignment to a variable you can fix at analysis time, but you run into all kinds of problems if there is something else that can change this. Like, what if you have a language that can change strings internally and do_stuff does that when it’s run?

But, if you were to compile this, you would set the value, and then later code that executed would change it. In an interpreted language like Python you can just do it when you run the interpretation stage.

Oh wait, another way to say this is you can’t really do it at the analysis stage for everything, but if you can do it for some things then you can speed up the language. Problem is, then you make the language less dynamic.

Thank you for clarifying. I’ll definitely need to go rebuild my analyzer.

I think I misunderstood how the World works. I thought I could turn it into a kind of scope tree which has a World node for each function definition. Now I’m realizing that my function calls can’t reconnect the function definition’s code/parse-tree to its ‘world-node’. It’s time for me to rethink this solution.

I found a tutorial about the call stack model.

https://www.quora.com/What-is-a-call-stack-Is-there-a-good-resource-to-get-a-thorough-understanding-of-this-subject?share=1

http://cryptroix.com/2016/10/16/journey-to-the-stack/

Oh wait… I can still use the scope tree… However, right now the child-nodes for the worlds (the ones that go hold the variables in their respective function definitions) are kept in a list structure. I think it would be better if I kept them in a dict with the keys corresponding to the function names. It makes the world a little more complicated, but I think it’s a more straightforward way for the function call to pass or copy the local scope to the function’s code. I might not need to make Stack and StackFrame classes unless the function call code gets hairy (eg, handling recursion or something)

The english used in this article is garbled at some points, but I’m only about halfway through and already I’m getting a much clearer picture of how function calls work than any of my other readings. I know I just posted the link above, but this call stack tutorial is highly recommended.. Here’s an excerpt.

what is return address :
A program which is a sequence of instructions uses ‘call’ instruction for function calls. On calling a function, it is important to save the address of next instruction that needs to be executed in the program.This essentially is the return address. For saving a return address, it needs to be put somewhere in the memory.The conventional method is to push in to the stack.Also, a function needs to save the parameters before executing its instructions, which will also be pushed in the stack.This forms the call stack. Now, for fetching the contents from the call stack, there will be a base pointer which will always be pointing to the start of the stack and a stack pointer which will be pointing to the last element pushed.

This explanation indicates that the function code itself is pushed onto the stack. That is the point that I missed from every other resource I’ve read on the topic. Maybe it was stated elsewhere, but it stood out to me in this particular tutorial. Now to finish reading.

EDIT: this point about the code going on the stack with the variables is incorrect. The author was discussing how the parameters get pushed onto the stack. The concept was oddly phrased.

Judging by sudden changes in the writing style, I have a feeling that some of this tutorial is pulled from an uncredited textbook. Maybe the blogger rewrote the examples for python. I might contact the blogger to find out which tutorial or textbook he/she studied. Nevertheless, even the sections in somewhat broken English are really helpful.

It appears this blog is more of a hacker’s notepad than an article, so I wouldn’t really call plagiarism on it. The blogger even provides a link to an entry on Citizendum from which he/she got a lot of the material.