Thinking out loud

Maybe I can use a foosball game to think of scope and context (I know very little about sports!)… here it goes.

Scope is the player’s table position (goalie, defense or offense). The table position is like the identifier’s position within the code (lexical scope).

Context is like the ball. It can move around the table, interacting with the players within reach.

Players within reach of the ball can be considered ‘in context’, while players outside reach of the ball are ‘out of context’.

This paragraph is making my head spin on the first read. https://en.wikipedia.org/wiki/Scope_(computer_science)#Overview

For entities such as variables, scope is a subset of lifetime (also known as extent) – a name can only refer to a variable that exists (possibly with undefined value), but variables that exist are not necessarily visible: a variable may exist but be inaccessible (the value is stored but not referred to within a given context), or accessible but not via the given name, in which case it is out of context (the program is “out of the scope of the name”). In other cases “lifetime” is irrelevant – a label (named position in the source code) has lifetime identical with the program (for statically compiled languages), but may be in or out of context at a given point in the program, and likewise for static variables – a static global variable is in context for the entire program, while a static local variable is only in context within a function or other local context, but both have lifetime of the entire run of the program.

Wait, what? When-how does an existing variable be inaccessible at the same time? Is it when it’s lower in the stack? e.g., maybe program execution goes out of global scope during a function call. That might make sense. I find it confusing that Python doesn’t always need ‘global identifier’ explicitly stated inside a function body. It’s like, “sometimes you need ‘global’, sometimes you don’t.” Maybe other programming languages are more explicit in that regard.

Maybe this scenario is also complicated by ‘garbage collection’ in languages like Python.

e.g., a value exists at a particular address in memory, but it doesn’t have a name bound to it, only a pointer. That’s C, right?

Most frequently, name resolution relies on an “inner-to-outer” rule, such as the Python LEGB (Local, Enclosing, Global, Built-in) rule: names implicitly resolves to the narrowest relevant context. In some cases name resolution can be explicitly specified, such as by the global and nonlocal keywords in Python; in other cases the default rules cannot be overridden.

OK, that explains a lot!

This is from the section, Function Scope.

However, some languages, such as C, also provide for static local variables, where the lifetime of the variable is the entire lifetime of the program, but the variable is only in context when inside the function. In the case of static local variables, the variable is created when the program initializes, and destroyed only when the program terminates, as with a static global variable, but is only in context within a function, like an automatic local variable.

Does this mean that the local value of the variable is persistent between function calls? I suppose this is a tidier alternative to a global variable which is modified by the function.

Here’s a quote that makes me wonder about name collision when calling recursively. If the variables of the calling function are still in context, are the identifiers for said variables in scope in the called function as well, or does the identifier only ‘rejoin’ the variable once the called function is completed and the context returns to the calling function?

By contrast, in dynamic scoping, the scope extends to the runtime context of the function: local variables stay in context when another function is called, only moving out of context when the defining function ends, and thus local variables are in context of the function in which they are defined and all called functions.

Maybe it’ll make sense once I learn a dynamically scoped language.

Wow! So in dynamic scope, each identifier is actually a stack that exists through all function calls and program flow. I suppose it’s important to use unique identifiers in each function (including counters). But recursion should be ok since each level of recursion is just pushing the current scope’s values onto the variable stacks.

With dynamic scope, a global identifier refers to the identifier associated with the most recent environment, and is uncommon in modern languages.[4] In technical terms, this means that each identifier has a global stack of bindings. Introducing a local variable with name x pushes a binding onto the global x stack (which may have been empty), which is popped off when the control flow leaves the scope. Evaluating x in any context always yields the top binding. Note that this cannot be done at compile-time because the binding stack only exists at run-time, which is why this type of scoping is called dynamic scoping.

I think I might use dynamic scoping if it really is easier to implement, and the analyzer’s ‘world’ object already has a central variable table. All I have to do is, instead of declaring the identifiers in key:value pairs, it will be key:list pairs. Each list is a stack of values for that variable. The position of each element in the list corresponds with the variable’s dynamic scope.

That might not be what Wikipedia is talking about here, but I’ll try it anyhow.

Dynamic scoping is fairly easy to implement. To find an identifier’s value, the program could traverse the runtime stack, checking each activation record (each function’s stack frame) for a value for the identifier. In practice, this is made more efficient via the use of an association list, which is a stack of name/value pairs. Pairs are pushed onto this stack whenever declarations are made, and popped whenever variables go out of scope.[11] Shallow binding is an alternative strategy that is considerably faster, making use of a central reference table, which associates each name with its own stack of meanings. This avoids a linear search during run-time to find a particular name, but care should be taken to properly maintain this table.[11] Note that both of these strategies assume a last-in-first-out (LIFO) ordering to bindings for any one variable; in practice all bindings are so ordered.

An even simpler implementation is the representation of dynamic variables with simple global variables. The local binding is performed by saving the original value in an anonymous location on the stack that is invisible to the program. When that binding scope terminates, the original value is restored from this location. In fact, dynamic scope originated in this manner. Early implementations of Lisp used this obvious strategy for implementing local variables, and the practice survives in some dialects which are still in use, such as GNU Emacs Lisp. Lexical scope was introduced into Lisp later. This is equivalent to the above shallow binding scheme, except that the central reference table is simply the global variable binding environment, in which the current meaning of the variable is its global value. Maintaining global variables isn’t complex. For instance, a symbol object can have a dedicated slot for its global value.

To Do: learn symbol tables, name binding and scope management

Studying Zed’s solution to ex34.

Next I’ll look at the solution to ex35

Ok… I think I see what Zed was talking about regarding making copies of the World object for all the function definition declared in the World.functions dict. But how does that actually work? I went to a lot of trouble to create separate parameter production types and variable types to support assignment, but now I think that I misunderstood the solution. The FuncDef productions will handle their Param productions so that it can put all the variables into the FuncDef’s own copy of the World.variable dict.

The same goes for the variable types themselves… I don’t need a separate variable type for expression and assignment. The FuncCall will handle the variable production in its own way, distinct from the FuncDef. The FuncCall will use them to set() their values. The FuncDef will use them to declare the variables in the local symbol table.

So now I think I need to scrap all that work I did on the productions and start over… again.

This language is pretty technical for me at this stage, but it is making a picture in my mind for implementing scope.

1 Like

Woah hold up. Is the world object being copied by the function definition, or by the function call?.. I think the function call is what passes the world object to the function definition… I’m so confused.

ok… thinking some more about implementing scope and the call stack.

each function call will maintain its own symbol table (world)… I need to find a way to simultaneously maintain a global symbol table on which each function call can push its variables onto their respective stacks. I think this will allow for masking. when the function returns, all the variables in the local table are also looked up in the global table and popped off their stacks.

Maybe the analyzer uses the World class strictly for variable declaration, while the call stack and scope are managed by the interpreter at runtime. So I might need a separate CallStack class along with a call frame class.

Arright… I’ve done some good reading since I reached ex34, but I’ve been on ex34 so long, I’ve lost sight of some of the implementation details of the scanner, parser and analyzer. I think this another opportunity to revise my scanner, parser and analyzer. This time I’ll have a clearer picture of how the analyzer and the interpreter fit together. I’ll keep the productions minimal and clean.

I’ll consider lexical scope as I build the production methods for function calls and function definitions. I’m also going to implement a symbol table class with pointers to child tables to represent scope. I’m not sure of how the function calls are going to share the local table with the function’s code, but I need to stop taking notes and start writing code.

Rewrote my scanner/tokenizer again! It’s even cleaner this time and it still tracks indent levels and line numbers for each token. I also rewrote the regexes so they would capture any whitespace following the lexemes. Once the tokens are assembled, their lexemes are rstripped of whitespace and the correct indent level is calculated and assigned to the indent_level attribute.

question: why did I have the regexes scoop up trailing whitespace?

answer: because the r’^def’ pattern would pick up identifiers with names like “definition” when it is only supposed to pickup the ‘def’ keyword. I considered that a bug, so I included whitespace in the patterns for all the keywords/tokens.

I also wrote the match, peek and push methods with no problems.

I’ve restarted the parser. I think I can recompose the grammars from memory.

Hopefully all this review and rewriting of the scanner and parser will make the analyzer stage much easier on the second attempt.

I’m finally done with the scanner and parser. It took me a while, but overall it’s an improvement over the last couple of versions. The added benefit is that I’ve gained a clearer picture of the structure of function calls and function definitions. Here’s the test output for reference (no indenting unfortunately):

def hello(x, y):
    print(x + y)

hello(1, 2)
hello(3, 4)

[
<Body indt:0>[
<FuncDef name:hello>[
<FArg x></FArg>,
<FArg y></FArg>]
<Body indt:1>[
<FuncCall print>[
<ExprPlus>
<ExprID 'x'></ExprID>
<ExprID 'y'></ExprID>
</ExprPlus>]
</FuncCall>]
</Body indt:1>
</FuncDef>,
<FuncCall hello>[
<ExprInt val:'1'></ExprInt>,
<ExprInt val:'2'></ExprInt>]
</FuncCall>,
<FuncCall hello>[
<ExprInt val:'3'></ExprInt>,
<ExprInt val:'4'></ExprInt>]
</FuncCall>]
</Body indt:0>]

I’ve already started the analyzer and even a bit of the interpreter, but I think there’s going to be a lot of designing, redesigning and debugging going on. This might take me another week.

I’m already realizing that my root level Body’s visit method will need to import my ‘built-in’ functions like print(). A class Root(Body) might be the way to start. I’ll modify the parse() method to start with a Root instead of a Body production.

Setting up the Root production was painless. for the parser.body() method I added a root=False flag in the parameters. That way the parser just does
root = self.body(root=True)
to create the root token, which is then put at the base of the abstract syntax tree (AST) list.

I’m not exactly sure of why I need the AST to be a list. It’s a tree after all.