A simple interpreter from scratch in Python # 4

Original author: Jay Conrod
  • Transfer
  • Tutorial


In the previous three parts, we created the lexer, parser, and AST for our toy language IMP. We even wrote our own library of parser combinators. In this final article, we will write the last component of the interpreter - the performer.


Let's think about how programs usually execute. At any given time, there are some “control points” that indicate which expression the program is about to execute next. When the next expression is executed, it modifies the state of the program by improving the “control point” and changing the values ​​of the variables.

To execute an IMP program, we need three things:
  1. Point of control - we need to know the following expression for execution.
  2. Environment - we need to simulate a “change in program state”.
  3. Execution functions - we need to know how the state and control point should be modified for each expression.

The simplest is the control point, at least for IMP. We arranged our intermediate representation in the form of a tree structure. We will simply call the evaluation functions for the top level expressions, which will recursively call the execution functions for the expressions inside. In essence, we will use the Python control point as our own. It would not be so simple for languages ​​with more complex control structures, like functions or exceptions, but we can keep it simple for IMP.

The environment is also simple. IMP has only global variables, so we can model the environment using standard Python dictionaries. When the value changes, we will update the value of the variable in the dictionary.

Execution functions are the only thing we should think about. Each type of expression will have its own execution function, which uses the current environment and will return a value. Arithmetic expressions are returned by integers, Boolean expressions return true or false . Expressions have no side effects, so the environment will not be modified. Each type of expression also has a function of execution. Claims act by modifying the environment so that no result is returned.

Define execution functions

We will define them as methods in our AST classes. This will give each function direct access to the structure that it performs. Here are the arithmetic functions:

class IntAexp(Aexp):
    ...
    def eval(self, env):
        return self.i
class VarAexp(Aexp):
    ...
    def eval(self, env):
        if self.name in env:
            return env[self.name]
        else:
            return 0
class BinopAexp(Aexp):
    ...
    def eval(self, env):
        left_value = self.left.eval(env)
        right_value = self.right.eval(env)
        if self.op == '+':
            value = left_value + right_value
        elif self.op == '-':
            value = left_value - right_value
        elif self.op == '*':
            value = left_value * right_value
        elif self.op == '/':
            value = left_value / right_value
        else:
            raise RuntimeError('unknown operator: ' + self.op)
        return value

You can see here a little extra logic in the case where the programmer uses a variable that is not defined earlier (one that is not defined in the environment dictionary). For simplicity and in order to avoid writing a system for catching errors, we give all undefined variables 0 .

At BinopAexp, we handle the case of the “unknown operator” by throwing a RuntimeError. The parser cannot create an AST from unknown operators, so it only gets easier for us. But if someone makes their own AST, that will need to be considered there.

Here are the functions of Boolean operations:

class RelopBexp(Bexp):
    ...
    def eval(self, env):
        left_value = self.left.eval(env)
        right_value = self.right.eval(env)
        if self.op == '<':
            value = left_value < right_value
        elif self.op == '<=':
            value = left_value <= right_value
        elif self.op == '>':
            value = left_value > right_value
        elif self.op == '>=':
            value = left_value >= right_value
        elif self.op == '=':
            value = left_value == right_value
        elif self.op == '!=':
            value = left_value != right_value
        else:
            raise RuntimeError('unknown operator: ' + self.op)
        return value
class AndBexp(Bexp):
    ...
    def eval(self, env):
        left_value = self.left.eval(env)
        right_value = self.right.eval(env)
        return left_value and right_value
class OrBexp(Bexp):
    ...
    def eval(self, env):
        left_value = self.left.eval(env)
        right_value = self.right.eval(env)
        return left_value or right_value
class NotBexp(Bexp):
    ...
    def eval(self, env):
        value = self.exp.eval(env)
        return not value

It is pretty simple. We use pitony relational and logical operators.

And here are the execution functions for each type of expression:

class AssignStatement(Statement):
    ...
    def eval(self, env):
        value = self.aexp.eval(env)
        env[self.name] = value
class CompoundStatement(Statement):
    ...
    def eval(self, env):
        self.first.eval(env)
        self.second.eval(env)
class IfStatement(Statement):
    ...
    def eval(self, env):
        condition_value = self.condition.eval(env)
        if condition_value:
            self.true_stmt.eval(env)
        else:
            if self.false_stmt:
                self.false_stmt.eval(env)
class WhileStatement(Statement):
    ...
    def eval(self, env):
        condition_value = self.condition.eval(env)
        while condition_value:
            self.body.eval(env)
            condition_value = self.condition.eval(env)

AssignStatement: we just execute the arithmetic expression on the right side, and then update the environment with the resulting value. The programmer is not limited in redefining variables that have already been defined.

CompoundStatement: we execute each expression, one after another. Remember that CompoundStatement is allowed wherever an expression is allowed, so long chains of expressions are decoded as nested.

IfStatement: First, we execute the boolean condition of the expression. If true, we execute a true expression. If false and a false expression have been defined, we execute the false expression.

WhileStatement: we fulfill the condition to check if the loop body should execute once. The condition is executed every iteration of the loop to check the condition.

Putting it all together

Well, we have created the main components of our interpreter and now it remains only to write a unifying program:

#!/usr/bin/env python
import sys
from imp_parser import *
from imp_lexer import *
def usage():
    sys.stderr.write('Usage: imp filename\n')
    sys.exit(1)
if __name__ == '__main__':
    if len(sys.argv) != 2:
        usage()
    filename = sys.argv[1]
    text = open(filename).read()
    tokens = imp_lex(text)
    parse_result = imp_parse(tokens)
    if not parse_result:
        sys.stderr.write('Parse error!\n')
        sys.exit(1)
    ast = parse_result.value
    env = {}
    ast.eval(env)
    sys.stdout.write('Final variable values:\n')
    for name in env:
        sys.stdout.write('%s: %s\n' % (name, env[name]))

Only one argument is supplied to the program - the name to be interpreted. She reads the file and sends it to the lexer and parser, reporting an error (if any). Then we extract the AST from the parser result and execute it using an empty environment. Since IMP does not have any output, we simply output the entire environment to the terminal.

A canonical example of calculating factorial:

n := 5;
p := 1;
while n > 0 do
  p := p * n;
  n := n - 1
end

The execution itself:

$ ./imp.py hello.imp
Final variable values:
p: 120
n: 0

Conclusion

In the last article, we wrote an interpreter for our simple language from scratch. The language itself is of little use, but the interpreter is quite extensible and its main components can be used in something else.

I hope this material provides a good start for people to experiment with language design. A few ideas:
  • Make variables local to the namespace.
  • Add a for loop.
  • Add expressions for I / O (input / output).
  • Add features.

Also popular now: