How to write a simple Scheme debugger
A while back, Rich Loveland asked how one might write a Scheme debugger. I realized that I’ve written many a Scheme interpreter, but I’ve never really thought about how to write a debugger. This post is a first step down that path. We’ll write what I’d consider a minimally functional debugger. It will allow you to break a running program into the debugger (albeit by invoking a primitive function in the program you’re running) and then inspect the variables that are available. As an added bonus, you’ll be able to evaluate arbitrary Scheme expressions from the debugger and even change the values of variables. For the moment, however, we will not support stepping program execution, which is admittedly an important feature of debuggers.
We’ll start by writing an interpreter for a small but interesting subset of Scheme. Then we’ll show see how to add debugging features to the interpreter. As usual, you’ll be able to see all the code on Github.
The Scheme Interpreter
We’ll start with the so-called “Three Line Interpreter” and then add a
few features to make it a bit more interesting. If you’re familiar
with Scheme interpreters, feel free to skim this section. If you want
more detail, check out the Essentials of Programming Languages. Our
interpreter, value-of
, takes two arguments. The first is the
expression to evaluate, and the second is the environment that maps
variable names onto values. Here’s the interpreter:
(define (value-of e env)
(match e
(,x
(guard (symbol? x))
(lookup x env))
((lambda (,x) ,e)
(lambda (v) (value-of e (cons (cons x v) env))))
((,[e1] ,[e2])
(e1 e2))))
It’s a few more than three lines, but we’re mainly interested in the
three match clauses. The first line is the variable reference
line. If you try to evaluate something like x
, we use lookup
to go
find the current value of x
in the environment. I’ll omit the
definition of lookup, but it’s basically the same as (cdr (assq x
env))
.
The next line is the lambda line. This creates a procedure from a
lambda expression. We cheat and use perfectly good Scheme’s built-in
lambda. Notice this procedure we create takes a value, v
, and then
calls value-of
on the body of the lambda. The important bit is the
(cons (cons x v) env)
, which adds a binding of x
to the value that
was passed in to the environment, so that the interpreter can find the
correct value when evaluating the body.
Finally, we have the application line. Basically, we use match
’s
catamorphism feature to recursively compute the value of both the
thing to be applied and the value to apply it to. Since our lambda
line evaluates to Scheme procedures, we just apply the value e1
to
its argument, e2
.
Although this doesn’t seem like a very rich language, you could use it to compute any computable function if you wanted to. You probably don’t want to though.
More Features
Now that we have the core of our interpreter, we can add features to
the language. In most cases, this is as simple as adding a few more
clauses to our match
expression. Let’s start with numbers.
(,n (guard (number? n)) n)
((,op ,[e1] ,[e2]) (guard (memq op '(+ * -)))
((eval op) e1 e2))
The first line leaves numbers as they are. There’s not really much more to do with them.
The second line lets you do things to numbers. We make sure the
operation is one of +
, *
or -
. We could add more, but these are
enough for now. I cheated once again and use eval
to convert the
symbol representing the operator into the Scheme procedure that
performs that operation. As before, we use catamorphism to evaluate
the two arguments to the operator.
To make debugging a little more interesting, let’s add some side
effects. We add set!
with the following match
clause.
((set! ,x ,[e])
(update-env! x e env))
Of course, this isn’t very interesting without knowing what
update-env!
does. Here you go!
(define (update-env! x v env)
(if (eq? x (caar env))
(set-cdr! (car env) v)
(update-env! x v (cdr env))))
Basically, we just search through the environment until we find the
variable to change and then use set-cdr!
to change its value.
Finally, let’s add begin
. We could simulate this with lambdas and
function applications, but it’s much cleaner just to add it
directly. We get begin with this clause:
((begin ,e* ... ,e)
(begin
(let loop ((e* e*))
(if (pair? e*)
(begin
(value-of (car e*) env)
(loop (cdr e*)))))
(value-of e env)))
We start by evaluating each of the first expressions for their effect, and then we return the value of the last expression.
At this point, we have enough that we can start to write some reasonably interesting programs. In particular, we can write programs that have bugs that we might want to debug. Let’s add a debugger!
The Debugger
The first thing we’re going to do is add a way to get into the
debugger. Most of the time the debugger does this by loading the
source files and letting you click on the point in the code where you
want a break point. This takes more UI work than I want to do right
now, so instead we’ll just add a special command to our language,
(debug)
, which breaks into the debugger. As usual, this is as simple
as adding another match clause:
((debug)
(debugger env))
This calls out to a function we have yet to define, called
debugger
. We pass in the environment debugger needs to see the
environment so we can inspect it.
The debugger itself is just a read-eval-print loop (REPL). It prompts the programmer for a command, then evaluates the command, prints out any results and then continues. Let’s start with a debugger that we can get out of:
(define (debugger env)
(printf "debug> ")
(let ((cmd (read))
(continue #t))
(match cmd
((continue) (set! continue #f))
(,else (printf "unknown command\n")))
(if continue
(debugger env)
(printf "running...\n"))))
We start out by printing debug>
and then using read
to read in an
S-Expression. Much like our interpreter (in fact, this is just an
interpreter for another language), we use match
to determine which
command we’re given and then evaluate it. For now we support one
command, (continue)
, which continues the execution of the
program. We do this by setting a flag that tells the debugger not to
continue it’s loop. Since we were called by value-of
, we just return
to that function and the interpreter picks up where it left off.
Here’s an example debugging session:
> (value-of '(debug) '())
debug> (continue)
running...
At the Scheme REPL, we call value-of
with a simple expression that
immediately breaks into the debugger, and we start in the empty
environment ('()
).
Most debuggers give you some kind of call stack. The closest analog to that in our language is a list of all the variables in scope, so let’s add a way to see these. Once again, we add one more match clause:
((show-env) (show-env env))
This just forwards to a procedure, show-env
, which prints out the
values in the environment. Here’s an example of how to use it:
> (value-of '(((lambda (x)
(lambda (y)
(begin
(debug)
(+ x y))))
4)
5)
'())
debug> (show-env)
0. x: 4
1. y: 5
debug> (continue)
running...
9
So now we can stop the execution, continue the execution, and see
what’s in the environment. What if we want to change things? We could
add commands to set values, but a more powerful way is to use the
target language itself to do this. Thus, we’ll add an eval
command,
which evaluates an expression in the debugger:
((eval ,e)
(printf "~s => ~s\n" e (value-of e env)))
Now, if we want to change values, we can just evaluate a set!
expression, like this:
> (value-of '(((lambda (x)
(lambda (y)
(begin
(debug)
(+ x y))))
4)
5)
'())
debug> (show-env)
0. x: 4
1. y: 5
debug> (eval (set! x 115))
(set! x 115) => #<void>
debug> (show-env)
0. x: 115
1. y: 5
debug> (continue)
running...
120
As expected, the resulting value changes to reflect the fact that we modified the environment during program execution.
So there’s a first steps towards a Scheme debugger. We were able to do this with relatively few changes to our interpreter. It seems to me that adding more advanced features would require more changes. For example, there’s no way to inspect or change the code that’s running now. To do this, we would have to keep track of this data in a form that is accessible to the debugger. Furthermore, we’re still missing a way to do finer grained execution control, such as stepping over a single statement or out of the current lambda. I suspect most if not all of these problems can be solved by writing our interpreter in continuation passing style. I hope to explore this in a later post.