Lmpthw ex32 scanner and token class

Questions about the ex32 scanner class:

__init__ Takes a similar list of tuples (without the re.compile) and configures the scanner.

I don’t understand why I should omit the re.compile regexes. Don’t I need those to scan/identify the tokens in each string?

You should also create a generic Token class that replaces the tuple I’m using. It should be able to track
the token found, the matched string, the beginning, and the end of where it matched in the original
string.

Is this a reference to the elements in the TOKENS list, or the match results returned by the match() function?

Here’s what I was thinking for a token class:

class Token(object):
    def __init__(self, token, match_string, start, end):
        self.token = token
        self.match_string = match_string
        self.start = start
        self.end = end

My question about omitting the re.compile still stands.

The word of the day is ‘LEXEME’. Thank you internet.

This SO thread clarifies some of the scanning/parsing vocabulary. Now I’m thinking about how to implement match, peek and skip.

Sorry it took so long to reply. So, your question really needs some code to be clear, but you have two choices in making the TOKENS list:

  1. The person writing the list just puts in the string, and your scanner runs re.compile() on each one. Advantage of this is it’s easier on the person typing it. Disadvantage is they don’t see an error until they are compiled in your scanner.
  2. The person writing the list does the re.compile() and is handing you a list of regex. Advantage of this is your scanner is ready to go. Disadvantage is they have to type more.

I kind of did a cheat though by creating a function called L() that just takes two strings and returns a (string, regex) combo.