Thinking out loud

austen · August 9, 2018, 3:36am

Hit another snag.

I’ve got relationals parsing, but the precedence is messed up when following an assignment. That may be because of how assignments are parsed. I didn’t have a problem with assignment parsing before because assignment has lower precedence than arithmetic. Relationals, however, have lower precedence than assignment. That may be the case which breaks the assignment parsing. I’ll fix it tomorrow.

Here is current test output. The test expression is at the top. This statement should return 1 (true), but if it were to run with this nesting, it might return 2 because the relational (EqTo) is nested inside the assignment (think of the precedence as parentheses around the relational).

y=6
2 + x = 4 == y

<StatementList>[
    <Statement>
        <Expression>
            <Assign>
                left_expr:
                <Variable>id:y</Variable>
                right_expr:
                <Number>value:6</Number>
            </Assign>
        </Expression>
    </Statement>]
</StatementList>

<StatementList>[
    <Statement>
        <Expression>
            <Sum>
                left_expr:
                <Number>value:2</Number>
                right_expr:
                <Assign>
                    left_expr:
                    <Variable>id:x</Variable>
                    right_expr:
                    <EqTo>
                        left_expr:
                        <Number>value:4</Number>
                        right_expr:
                        <Variable>id:y</Variable>
                    </EqTo>
                </Assign>
            </Sum>
        </Expression>
    </Statement>]
</StatementList>```

austen · August 9, 2018, 12:50pm

Problem solved by a totally gnarly hack… and I mean that in the negative sense of ‘gnarly’!

My problem was that assignment was scooping up the entire rest of the unparsed expression for the RHS. This meant that lower-precedence operations (ie relationals) were getting nested inside higher-precedence operations (ie assignments). That is the wrong nesting for correct order of operations.

My solution was to write a modified version of the expression parser to act like a wrapper which handles an assignment-specific edge case where if the next operator has lower precedence, the assign_recurse expression parser backs out of recursion and only returns a single collected term (rather than the rest of the expression).

So why didn’t I just have the parser peek ahead a couple of tokens to check the precedence of the next operator? Because some terms in an expression are composed of multiple tokens. Namely, the parentheses. They count as a single term, but they take at least three tokens and can contain even more operators within. This makes peeking ahead more complicated than my solution.

How can this solution be improved? I can definitely refactor the expression parser. Specifically, the while loop that is in expr_recurse and assign_recurse could become its own function. Perhaps that would make it possible to roll the edge case for assignment back into expr_recurse so I can get rid of assign_recurse entirely.

Will I take that step? Maybe. I’ve spent a lot of time on expression parsing. It’s finally working, and now I’d like to move on to the semantic analyzer.

Here is the code for the assign_recurse() expression parser

def assign_recurse(self, assign_op):
        LTm = self.get_term()        
        if self.end_of_expression():
            return LTm
        next_op = self.peek()
        if self.has_precedence(assign_op, next_op):
            # back out of recursion if assignment is followed
            # by lower-precedence operator 
            return LTm
        else:
            LOp = self.get_op()
            RTm = self.get_term()
        # parse rest of expression as usual    
        while not self.end_of_expression():
            ROp = self.peek()
            if self.has_precedence(LOp, ROp) and ROp not in self.assigns:
                LTm = self.op_node(LTm, LOp, RTm)
                if prev_term: 
                    return LTm
                else:
                    LOp = self.get_op()
                    RTm = self.get_term()
            else:
                if ROp in self.assigns:
                    ROp = self.get_op()
                    RHS_assign = self.assign_recurse(ROp)
                    RTm = self.op_node(RTm, ROp, RHS_assign)
                else:
                    RTm = self.expr_recurse(RTm)

        return self.op_node(LTm, LOp, RTm)

and the test output

y = 6
2 + x = 4 == y
y == 2 + x = 4

<StatementList>[
    <Statement>
        <Expression>
            <Assign>
                left_expr:
                <Variable>id:y</Variable>
                right_expr:
                <Number>value:6</Number>
            </Assign>
        </Expression>
    </Statement>]
</StatementList>

<StatementList>[
    <Statement>
        <Expression>
            <EqTo>
                left_expr:
                <Sum>
                    left_expr:
                    <Number>value:2</Number>
                    right_expr:
                    <Assign>
                        left_expr:
                        <Variable>id:x</Variable>
                        right_expr:
                        <Number>value:4</Number>
                    </Assign>
                </Sum>
                right_expr:
                <Variable>id:y</Variable>
            </EqTo>
        </Expression>
    </Statement>]
</StatementList>

<StatementList>[
    <Statement>
        <Expression>
            <EqTo>
                left_expr:
                <Variable>id:y</Variable>
                right_expr:
                <Sum>
                    left_expr:
                    <Number>value:2</Number>
                    right_expr:
                    <Assign>
                        left_expr:
                        <Variable>id:x</Variable>
                        right_expr:
                        <Number>value:4</Number>
                    </Assign>
                </Sum>
            </EqTo>
        </Expression>
    </Statement>]
</StatementList>

austen · August 9, 2018, 6:48pm

I’ve got the analyzer working now. It has a symbol table. It checks assignments to make sure there’s a variable on the LHS. It initializes all variable with the Value of “0” (yes, the number is stored as a string). It performs symbol lookups without issue. This analyzer was much easier to build than the one in the QuasiPy because bc doesn’t use lexical scoping and I’m not implementing function calls.

Next I’m going to start working on the interpreter to get those operators and conditional statements working. I’m hoping to have this exercise done within a couple of days. Maybe even by tomorrow morning.

It would be so cool to have working if statements and loops!

BTW, bc has this weird thing where variable declaration is as simple as just invoking a variable. You don’t have to explicitly declare it. for example, if you have

x = 3
x = y   # y is unassigned and undeclared. this should throw an exception, right?
x
0       # wrong! it was implicitly initialized on line 2 with a value of 0. crazy.
        # you can't get undefined variable exceptions in bc.

austen · August 10, 2018, 10:49am

BOOM!

I’ve got the basic interpreter working. Here’s how an interactive session would look in bc. I’m adding carets for the user input.

bc
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
> y=6
> y
6
> z
0
> 2 + x = 4 == y
1
> y == 2 + x = 4
1

And here’s the test output for my baby version of bc.

TEST INPUT>>> y=6
6.0
TEST INPUT>>> z
0
TEST INPUT>>> 2 + x = 4 == y
1
TEST INPUT>>> y == 2 + x = 4
1
!!! SYMBOL TABLE {'y': 6.0, 'x': 4.0}

I’ll test conditions and loops next.

ulfen69 · August 10, 2018, 11:46am

Hi @austen.

I must say I am impressed.
Not just of the amount of work you have put down on your code.
But also updating us with your progress.
Something to read through when (if ever) I get there.
Thanks for sharing.

austen · August 10, 2018, 12:00pm

Thank you very much for the encouraging feedback @ulfen69. I’m glad you enjoy seeing me flail and struggle with LMPTHW. The truth is, typing out my understanding really does help me solve problems faster even if no one is listening. I think Zed said as much in LPTHW (can’t remember how he articulated it exactly). Try it the next time you feel stuck.

austen · August 10, 2018, 12:01pm

Here’s the test output in a for-loop statement. I’m pretty satisfied. I think I’ll move on to ex37.

TEST INPUT>>> x = -28
28.0
TEST INPUT>>> for (x = 10; x; x -= 1) y += 5
TEST INPUT>>> x
0.0
TEST INPUT>>> y
50.0
!!! SYMBOL TABLE {'x': 0.0, 'y': 50.0}

nmaloof94 · August 10, 2018, 12:43pm

I’ve been thinking of copying your idea for a while. I’m at the end of LPTHW, but I may try what you’re doing for LMPTHW.

ulfen69 · August 10, 2018, 1:13pm

I have got that advise too.
Zed said: ”e-mail yourselve about the problem”.

austen · August 10, 2018, 2:05pm

Ha. I just noticed that I forgot to write the interpreter code for the negative symbol!
Easy fix. Here’s the output of the corrected negation.

TEST INPUT>>> x = -28
-28.0
TEST INPUT>>> for (x = 10; x; x -= 1) y += 5
TEST INPUT>>> x
0.0
TEST INPUT>>> y
50.0
!!! SYMBOL TABLE {'x': 0.0, 'y': 50.0}```

austen · August 11, 2018, 9:38pm

Heheh. All this talk of thread titles had me go and review my solution to ex35 (Puny Python).

Although I got it working, the scoping is indeed wrong. I used a scoping rule from some other language. I should have used a call stack after all. I almost want to go back and write it over again, but it would be so much work!

Why do I need to go back? Because if you have an enclosed function, such as

def outside():
  x = 5
  def inside():
    print("hello from inside! x==", x)
  inside()
  x = 3
  inside()

When you call outside(), the inside() will print x==5 and then x==3 to reflect the change in the environment. From what I understand, this is not a closure. In my implementation of PunyPy/QuasiPy/whatever, the inside() will always print x==5, which is a closure.

Although first class functions can be passed to and called from any location, the environment in which the first class function was defined still persists on the stack. If there’s a change to a free variable in that particular defining environment, that change will be reflected every time inside() is called.

How did I get confused about this? Because I did a subsequent test where I had a recursive function which accumulated multiple definitions of the same variable. This looked like a bunch of closures to me because each definition of the inside function would return a free variable which was tied to each level of recursion. eg, inside[0] might return “x==3” and inside[1] might return “x==4”.

On the surface, this does indeed look like a closure, but it’s important to remember that a recursive function is very different from a loop. Each call to the recursive function, each recurse, has a distinct “frame” on the stack. When the call to the recursive function is complete, the free variables don’t just ‘go away’ when the recursive’s call frame is popped off the stack like I would have expected. Instead, they persist for as long as the inner-nested function definitions (aka 'inside(), which contain references to those free variables) exist. I don’t know what this means for how the call stack actually works under the hood in python, but this is what makes sense to me in practice.

This is why I was on the right track when I decided to implement a call stack. The structure I had imagined would have allowed for this persistence of free variables.

I feel silly about it. Kind of embarrassed, but hindsight… So would it be a big deal to go back to ex35 and try to fix this flaw? I’ll sleep on it.

austen · August 13, 2018, 8:02pm

I was ready to abandon hope and close the door on ex35 when the unthinkable happened. Hang on to your keyboards for the thrilling conclusion of…

http://lrnjrnl.com/emulating-python-s-lexical-scope-semantics

austen · August 14, 2018, 7:48pm

Starting ex37. It’s a BASIC interpreter. There are many versions of BASIC.
Here is a helpful link to a BASIC programming manual.
http://www.bitsavers.org/pdf/dartmouth/BASIC_Oct64.pdf

And here is the page where I found it.

http://www.thefullwiki.org/Dartmouth_BASIC

There’s also the wikipedia entry for Dartmouth BASIC, but the external resource links are in spanish or broken.

I’ve found an online interpreter, but it’s for Applesoft BASIC. I think I might learn from this one anyhow.

http://www.calormen.com/jsbasic/

Edit:
I found a better version of the Dartmouth BASIC manual. It’s for the 4th edition and there’s a helpful summary document too. The summary roughly outlines the grammars and expression parsing.
http://www.bitsavers.org/pdf/dartmouth/BASIC_4th_Edition_Jan68.pdf
http://www.bitsavers.org/pdf/dartmouth/BASIC_summary.pdf

It all came from this awesome computing archive.

Also, although I’m not planning on using a parser-generator, I’m reading the SLY introduction so I can write precise grammars that support a less-convoluted parser. My expression parser for ex36 was twisted because my grammars were too ambiguous.

austen · August 15, 2018, 6:53pm

Here’s an excerpt about numbers using scientific notation from the Dartmouth BASIC manual.

… We gain further flexibility by use of the letter E, which stands for “times ten to the power”. Thus, we may write .00123456789 in a form acceptable to the computer in any of several forms: .123456789E-2 or 123456789E-11 or 1234.5789E-6. We may write ten million as 1E7 (or 1E + 7) and 1965 as 1.965E-3 (or 1.965 + E3). We do not write E7 as a number, but must write 1E7 to indicate that it is 1 that is multiplied by 10^7.

I’m not sure if I want to implement this syntax feature.

austen · August 23, 2018, 2:27am

I’ve put down BASIC for a bit because I’m reading about context-free grammars from wikipedia. I want to better understand how LR parsing works (and particularly shift-reduce parsing). For example, in the ‘reduce’ operation, is the reduction nesting further and further out from the terminal? It looks as though it is, but I’m not 100% on that answer.

The SLY documentation is helpful, but I think the author assumes the reader already knows about this stuff.

austen · September 1, 2018, 12:26am

Yikes. I’ve been busy doing other stuff. It’s the last day of August and I still haven’t finished Part 4 of LMPTHW. It’s time to take a break from Python to start working on front-end stuff… html, css & javascript. I’ll do a few projects before I come back to finish LMPTHW.

austen · October 28, 2018, 5:34pm

Hi @zedshaw, I’m mostly studying JS right now, but I’m looking to complete a working BASIC interpreter Python3.

Here’s where I’m at so far:

found official documentation for Dartmouth BASIC First and Fourth Ed.
built a scanner to generate tokens from string input
started studying LR expression parsing
ran semtantic experiments using browser version of Apple II BASIC

I’m finding there are just too many versions of BASIC from which to choose, but not enough functioning interpreters for experimentation. For example, there is not a single working version of Dartmouth basic available online. That means I have no way to test and modify the code sample from LMPTHW or Wikipedia

Which of the following would you suggest?

Stick with a version of Dartmouth which can run the example code and just make a best guess about semantics and build an interpreter based on the original documentation.
Scrap the exercise code and work from a readily available and documented version of BASIC. e.g. Apple II

Also, I’m brainstorming ways of parsing and analyzing the code by feeding the grammars to a parser-generator. The problem is that the line numbers are too dependent on implementation details (such as using a dict for storing statements by number) to incorporate directly into the grammars and the the resulting AST. I wonder if I need to have a structure in my “world” object which allows the interpreter to lookup each statement in a numbered dict where gosub/goto can redirect program flow. I’m not sure of how much work I’d make for the analyzer since scope is pretty flat (no nested functions… maybe just check for declarations and datatypes?).

What’s your take on this approach?

zedshaw · November 2, 2018, 1:06pm

Hmmm, I’d say if you want to do a more formal working version of BASIC then you should definitely go with the one that has the best formal description of the language. Problem is, BASIC kind of predates formal descriptions of languages so that’s probably why you don’t find any.

austen · November 2, 2018, 7:46pm

Hey @zedshaw. Just a little FYI. After your feedback, I did some more digging with a different search engine and hit paydirt! Here are the two resources I need for my BASIC interpreter. One is an emulation of the Dartmouth Time Share System complete with BASIC interepreter. The other is a paper outlining a formal definition of Dartmouth BASIC 5th Edition and it includes the grammars in BNF! I don’t know why it took me so long to find these resources, but there you go. A pot o’ gold. If you ever do another edition of LMPTHW, these might be handy to include for the Little BASIC project concluding Part 4: Parsing Text.

I plan to get one of these emulations running and then see about formatting the grammars to ABNF for SLY.

zedshaw · November 11, 2018, 4:28pm

SWEET! I might steal this from you.

Thinking out loud

BOOM!

Windows and Mac emulation of DTSS

DTSS emulation online server (requires registration for login)

Dartmouth BASIC syntax and semantics (formal grammars in appendices)