__init__ Takes a similar list of tuples (without the re.compile) and configures the scanner.
I don’t understand why I should omit the re.compile regexes. Don’t I need those to scan/identify the tokens in each string?
You should also create a generic Token class that replaces the tuple I’m using. It should be able to track the token found, the matched string, the beginning, and the end of where it matched in the original string.
Is this a reference to the elements in the TOKENS list, or the match results returned by the match() function?
Sorry it took so long to reply. So, your question really needs some code to be clear, but you have two choices in making the TOKENS list:
The person writing the list just puts in the string, and your scanner runs re.compile() on each one. Advantage of this is it’s easier on the person typing it. Disadvantage is they don’t see an error until they are compiled in your scanner.
The person writing the list does the re.compile() and is handing you a list of regex. Advantage of this is your scanner is ready to go. Disadvantage is they have to type more.
I kind of did a cheat though by creating a function called L() that just takes two strings and returns a (string, regex) combo.