Writing a simple Lisp translator - II

    Previous article

    Implement the assignment operator


    And now we will teach the translator to process the assignment operator. And here we are faced with a classic task - to ensure the calculation of the algebraic formula given in the usual record for us since school years. If we did the interpreter, we would need to calculate the value of the formula. In our case, the Lisp core will be calculated (at runtime). And we just need to convert the formula from the record we are used to Lisp's.
    The record we are used to is called “infix notation” (the sign of the operation is located between the operands). In Lisp, the operation sign is placed before the operands (this is called “prefix notation”). Thus, our task is to convert the infix form to the prefix form.

    This problem can be solved in different ways ...

    I propose to convert the formula to the so-called. “Reverse Polish form” (SCR). The reverse Polish notation (named after the Polish mathematician Lukaszewicz) is a single-form notation in which the operation signs are located after the operands (“postfix notation”). Translation from the postfix form to the prefix form is quite simple. It would be possible to “solve the problem in one action” - immediately implement the conversion from the infix form to the prefix one. But this solution would be somewhat more cumbersome. However, anyone can try to do it yourself.

    And we will deal with the transformation of the formula in the SCR. At the entrance we have an algebraic formula in infix notation, presented in the form of a Lisp multilevel list. For example, like this:

    (12 + x / ( y ^ 2 + z ^ 4))

    In the SCR, this expression will have the following (seemingly strange) look:

    (12 x y 2 ^ z 4 ^ + / +)

    To calculate the value of a formula in the SCR form, you need a stack (a data structure that stores and delivers data according to the principle “last arrived, first out”). The calculation is very simple. The list is read once. And for each element the following actions are performed:

    • The number (of variable values) is simply pushed onto the stack;
    • The operation is performed on the corresponding number of operands from the top of the stack.

    Please note that there are no brackets in the SCR, and the operations are performed in the order in which they are recorded (there are no more priorities, as in the infix form).

    The expression we want to translate into an SCR can contain numbers, variables, functions, and operation signs. There is a problem - how to distinguish a variable from a function? A natural way to solve this problem is to create a list of all operations and functions and check the required symbol against this list: if a symbol is found in the list, this means an operation, otherwise it is a variable.
    In addition, for each function / operation, you need to store its arity (the number of arguments). The basic list of operations may look like this:

    (setq *oplist* '((+ 2) (- 2) (* 2) (/ 2) (^ 2) (\ 2) (% 2)
                     (= 2) (== 2) (/= 2) (> 2) (>= 2) (< 2) (<= 2) 
                     (and 2) (or 2) (not 1) 
                     (sin 1) (cos 1) (abs 1) (exp 1) (log 1) (sqrt 1)))
    

    It should be noted that during the operation of the translator, this list may increase. The fact is that the functions of mini-basics return values ​​and can participate in expressions. The names of these functions and their arity must be added to the list * oplist *. This can be done in the action-proc procedure in the branch that processes the proc statement. The variable * oplist * can be created in the start procedure (and destroyed before completion). And adding a function in the action-proc code can be implemented like this:

    (cond ((eq (car stmt) 'proc) 
           (setq proc-name (nth1 stmt))
           (setq proc-parm (nth2 stmt))
           (setq *oplist* (cons (list proc-name (length proc-parm)) *oplist*)))  
    

    Now it is necessary for each operation that occurs in the function to set a certain priority. Priorities are set by the following function:

    (defun prty (OP) 
      (cond ((EQ OP 'and) 1)
            ((EQ OP 'or)  1)
            ((EQ OP '>)   2)
            ((EQ OP '>=)  2)
            ((EQ OP '<)   2)
            ((EQ OP '<=)  2)
            ((EQ OP '=)   2)
            ((EQ OP '==)  2)
            ((EQ OP '/=)  2)
            ((EQ OP '+)   3) 
            ((EQ OP '-)   3) 
            ((EQ OP '*)   4) 
            ((EQ OP '/)   4) 
            ((EQ OP '\)   4)
            ((EQ OP '%)   4)
            ((EQ OP '^)   5)
            ((member op (mapcar 'car *oplist*)) 6))) 
    

    The lowest priority is for logical operations “and” and “or”. Then there are comparison operations, then addition and subtraction, etc. The highest priority functions.

    Now we will describe the algorithm for translating an expression into an SCR. The inf2ipn function accepts one mandatory parameter (input formula) and two optional parameters (operation stack and accumulator list). The list accumulator accumulates the result. The function scans the input list and acts as follows:

    • If its next element is a number or a variable, then it is entered into the battery.
    • If the next item is a list, the function is applied to this list recursively, and its result is added to the battery.
    • If the next element is an operation, then with an empty stack of operations, the next element is entered into the stack of operations. With a non-empty stack of operations, a comparison is made between the priority of the incoming operation and the top of the stack of operations. If a higher priority operation arrives, it is pushed onto the stack of operations.
    • If an operation came with a priority not greater than the top of the stack, then the top of the operations stack is transferred to the accumulator, and the newly arrived operation is put on the operations stack.
    • If the input list is exhausted and the stack of operations is empty, then the function returns a reversed battery (terminal branch). Otherwise, the function returns a reversed battery with a pre-attached list of operations from the stack.

    The function that distinguishes the operation from the operand is very simple - it comes down to checking whether the given character falls into the global list * oplist *:

    (defun is-op (o) 
      (member o (mapcar 'car *oplist*)))
    

    And the transfer function in the IPF is:

    (defun inf2ipn (f&optional (snil) (rnil))
      (cond ((null f) 
          (if (null s) (reverse r) 
                       (inf2ipnnil (cdr s) (cons (car s) r))))
            ((listp (car f)) (inf2ipn (cdr f) s (append (reverse (inf2ipn (car f))) r))) 
            ((numberp (car f)) (inf2ipn (cdr f) s (cons (car f) r)))
            ((not (is-op (car f))) (inf2ipn (cdr f) s (cons (car f) r)))
            (t (cond ((null s) (inf2ipn (cdr f) (cons (car f) s) r))
                     ((> (prty (car f)) (prty (car s))) (inf2ipn (cdr f) (cons (car f) s) r))
                     (t (let ((a (car s)))
                             (inf2ipn (cdr f) (cons (car f) (cdr s)) (cons a r))))))))
    

    You can verify the functionality of this function by calling it directly:

    (inf2ipn '(2 + 3 * 6))
    ==> (2 3 6 * +)
    (inf2ipn '((2 + 3) * 6))
    ==> (2 3 + 6 *)
    (inf2ipn '(3 + a * sin ( 5 + x)))
    ==> (3 A 5 X + SIN * +)
    

    Get from the SCR prefix entry is very simple. The ipn2inf function accepts the expression in the SCR and the drive parameter. The function works like this:

    • If the input list is empty, the drive's head returns;
    • If the next element is a number or a variable, then this atom joins the storage ring;
    • If the next element is the operation of arity n, then a list consisting of the symbol of this operation and the reversed segment of the drive of length n is added to the drive without n first elements.

    Here is how it looks in code:

    ;; Получение арности операции
    (defun arity (o)
      (iter (for a in *oplist*) (when (eq o (car a)) (return (cadr a)))))   
    ;; Перевод из ОПЗ в префиксную
    (defun ipn2pref (f&optional (snil))
      (cond ((null f) (car s))
            ((numberp (car f)) (ipn2pref (cdr f) (cons (car f) s)))
            ((is-op (car f))
             (let ((ar (arity (car f)))) 
               (ipn2pref (cdr f) (cons (cons (car f)
                      (reverse (subseq s 0 ar))) (subseq s ar)))))
            (t (ipn2pref (cdr f) (cons (car f) s)))))
    ;; Функция-обертка, переводящая инфиксную запись в префиксную
    (defun i2p (f)
      (ipn2pref (inf2ipn f)))        
    

    Check the code performance:

    (i2p '(3 + a * sin ( 5 + x)))
    ==> (+ 3 (* A (SIN (+ 5 X))))
    (i2p '((3 + a) * sin ( 5 ) + x))
    ==> (* (+ 3 A) (+ (SIN 5) X))
    (i2p '((3 + a) * sin ( 5 ^ 2 - x ) + x))
    ==> (* (+ 3 A) (+ (SIN (- (^ 52) X)) X))
    

    Now it remains to write the handler of the assignment operator and connect it to the handler of the procedure. An assignment handler can be implemented like this:

    (defun action-set (meat)
      (let ((name-var (car meat))
            (r-value (i2p (cddr meat))))
        `(setq ,name-var ,r-value)))
    

    It is assumed that the meat parameter refers to the assignment in the form of a list:

    (имя = выражение)

    The recognition of the assignment operator is done in the action-proc function, which takes the form:

    (defun action-proc (fi)
       (let ((stmtnil)
             (proc-namenil)
             (proc-parmnil)
             (loc-varnil)
             (lvnil)
             (bodynil))
        (loop
             (setq stmt (mk-intf (getLine fi))) 
             (when (null stmt) (returnt))
             (cond ((eq (car stmt) 'proc) 
                        (setq proc-name (nth1 stmt))
                        (setq proc-parm (nth2 stmt)))                    
                   ((eq (car stmt) 'end_proc) (returnt))
                   ((eq (car stmt) 'print) 
                        (setq body (append body (list (cons 'printline (cdr stmt))))))
                   ((eq (car stmt) 'input) 
                        (setq body (append body (list (list 'setq (cadr stmt) (list 'read) )))))
                   ((eq (car stmt) 'local) 
                        (setq loc-var (append loc-var (cdr stmt))))
                   ((eq (cadr stmt) '=) 
                        (setq body (append body (list (action-set stmt)))))
                   (t (printsline (strCat"**** Оператор " (output stmt) " не распознан")) (setq *flagerr* t))))
        (iter (for a in (setof loc-var)) (collecting (list a 0) into lv))           
        `(defun ,proc-name ,proc-parm (let ,lv ,@body))))
    

    Let's check the performance of our code. Let's load the code on Lisp Wednesday, call the start function and broadcast the following procedure: Let's see what our translator generated:

    0001 proc main()
    0002 local x,y,z
    0003 x=3
    0004 y=4
    0005 z=x^2+y^2
    0006 print z
    0007 end_proc




    (getd 'main)
    ==> (EXPR NIL (LET ((X0) (Y0) (Z0)) (SETQ X 3) (SETQ Y 4) (SETQ Z (+ (^ X 2) (^ Y 2))) (PRINTLINE Z)))
    

    It seems that's right. Now let's call our procedure and get the expected result:

    (main)
    25
    ==> 25

    Our translator will handle properly and inline functions. To be convinced of this, let's execute, for example, the following code: And we get:

    0001 proc main()
    0002 local x,y,z,pi
    0003 pi=3.1415926535
    0004 x=sin(pi/6)
    0005 y=cos(pi/6)
    0006 z=x^2+y^2
    0007 print x
    0018 print y
    0019 print z
    0010 end_proc




    (main)
    0.4999999999870390.8660254037919221.0
    ==> 1.0

    Our translator comes alive before our eyes!

    However, we were very enthusiastic: aiming at the final goal, we didn’t think at all about the mistakes that the user can make (that is, the programmer using the mini-basic). In an amicable way, we had to think about mistakes right away, but we just started working, did not go too far, and it’s not too late to add tools to the error handling and diagnostics. In addition, it is absolutely obvious that “minor improvements” suggest themselves (for example, our translator requires the operator to occupy exactly one line, and this is inconvenient).

    The following article will be devoted to all these questions.
    Continued follows

    The code for this article can be downloaded here.

    Also popular now: