Gene Callahan

Subscribe to Gene Callahan: eMailAlertsEmail Alerts
Get Gene Callahan: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Design Parameters in a Java Interpreter

Design Parameters in a Java Interpreter

This article describes our use of design patterns to create an interpreter in Java, and shows how it can be built in a "pure," object-oriented fashion. .The patterns we use are from Design Patterns: Elements of Reusable Object-Oriented Software by Gamma, Helm, Johnson and Vlissides, published by Addison Wesley in 1995. (We'll refer to this book henceforth as DP.)

What Did We Build?
HotScheme is an online, multiplatform interpreter of the Lisp dialect Scheme, with GUI front-end and interactive Internet capabilities. It's currently implemented as a Java applet, although the core of the interpreter is independent of the front end. At present it's only a partial implementation of ANSI Scheme, intended for use as an educational tool for learning Scheme, GUI design and interpreter implementation.

The project came into existence while we were working with Dr. Jo Anne Parikh of the Southern Connecticut State University Computer Science Department. The interpreter was originally written in C++ and later ported to Java. We found that the port was not difficult, and that for our purposes Java had several key advantages over C++. It made garbage collection a breeze, exception handling more elegant and allowed us to easily add features to the language, such as fetching the contents of a URL and creating objects from the Java AWT. And of course, its being able to run as an applet from a Web browser handled the problem of distribution. We plan to take further advantage of Java's built-in networking capabilities and make more AWT features available from Scheme.

The Patterns
Interpreter: Modeling Scheme Grammar Directly in Java Objects
In the Interpreter pattern a class represents each grammar rule in the language. The interpreter's parse tree, which is its structural representation of the program under execution, is built as a Composite (see next section) of simpler grammatical elements.

We extended the Interpreter pattern to include interpreter input, borrowing an idea from Bjarne Stroustrup's The C++ Programming Language (Addison Wesley 1993). Instead of the core logic being in a large state machine, as in a procedural interpreter, it is dispersed through a hierarchy of classes. Objects in HotScheme know how to construct themselves from an input stream, assemble themselves into a parse tree and return the value they represent. There is a trade-off here: a more traditional interpreter, like those generated by the combination of lex and yacc, will run faster, but they may be a little harder to understand. It was for that reason that we chose simplicity and clarity over performance - a pretty reasonable trade-off given that we viewed this as a tool for learning and experimentation, rather than for the purpose of creating production systems.

We modeled Scheme data types directly as Java classes. For a Lisp dialect, capturing the data types as classes means capturing the whole language, as Lisp programs are Lisp data. This simplifies understanding the interpreter; its structure reflects the definition of Scheme, and can be understood by reading a Scheme manual.

Lisp interpreters operate in an endless read-eval-print loop by first getting input from the user, evaluating it and then returning the result of the evaluation. We discuss the read phase of this loop in the Abstract Factory section below. Evaluating the parse tree is simply a matter of calling Eval() on the SchemeObject returned by the read phase. Leaf nodes in the tree (Scheme atoms, such as numbers, strings and symbols) return themselves as their value or, in the case of symbols, return the object they label. Calling Eval() on high-level nodes will trigger a recursive evaluation of all nodes lower in the tree, yielding a single return value - of a type descended from SchemeObject - for that portion of the tree.

Similarly, this result is printed by calling Print() on the object returned by Eval(). If this object is a composite, it will call Print() on its components. This makes the top-level code so simple that the body of this loop, the heart of the interpreter, is a single statement:

// term is the current terminal, global_env the lexical environment:
term.println(((SchemeObject.make(term, SchemeObject.START,

The meat of the evaluation phase is in the List.Eval()method (see Listing 1). This is because the most fundamental Scheme activity is function application. When a list is evaluated, its default behavior is to treat the first item in the list (the car of the list) as a function to be applied to the rest of the list (the cdr) - the arguments to the function. As Java lacks the varargs feature present in C and C++, having a list data structure directly available in Java greatly simplified writing Java functions that handle the variable-length argument lists required by Scheme. Every Java method that implements a Scheme function takes a list of SchemeObject elements as its single argument, and pulls apart its "actual" arguments itself.

It's important to note that List.Eval() calls Evargs() (evaluate arguments) on the rest of the list before passing these arguments to the function. (We use an auxiliary function, Evargs(), rather than Eval() itself, because a second call to Eval() would treat the first member of the rest of the list as a function - not at all what we want!) For this reason certain Scheme forms, called syntax forms, can't be handled by an ordinary function application. Consider the case of an if statement. The ANSI Scheme specification for if says that it evaluates its first argument and, if true, returns its second argument; otherwise it returns its third. But the specification also states that whichever branch is not returned must not be evaluated. If we used an ordinary function application to evaluate if, it would be too late! We'd have evaluated the arguments before even calling if. Thus these syntax forms are handled by special cases in List.Eval(). The objects created to represent these forms in the parse tree are all descendants of the HotScheme class SyntacticalForm, in keeping with our use of classes to capture grammar.

Composite: Building the Parse Tree
The intent of the Composite pattern is to allow clients to treat individual objects and composites uniformly. The parse tree that HotScheme builds to represent the command it is interpreting is an instance of this pattern.

Lisp programs are built from Lisp's main compositional structure: the list. The fact that a Lisp program processes lists and is also made up of lists lends an elegant simplicity to a well-built Lisp interpreter. Lists are a basic data type, and can generally appear in the same places as atoms such as integers and strings. This allows us to use a Composite to represent the parse tree. We represent lists as SchemeObjects that hold references to other SchemeObjects, which can themselves be lists. A client calling an object's Eval() or Print() method need not worry about whether it is dealing with an atom such as an integer or a composite such as a list - the call is made the same way by the client, with the composite recursively passing the call on to its components, if necessary. For an example of how Composite simplifies handling aggregate entities, see the implementation of List.Print() in Listing 2.

Abstract Factory: A Common Ancestor Creates All Scheme Data Types
As mentioned above, objects know how to read themselves from a tokenized input stream and assemble themselves into a parse tree. The class SchemeObject, in its role as an Abstract Factory, is the place where this knowledge resides.

The pattern DP refers to as Abstract Factory is also described in James Coplien's Advanced C++ (Addison Wesley 1992), where it is called the Exemplar idiom. An Abstract Factory allows clients to create subclasses of a class without specifying which subclass to create. To achieve this, SchemeObject itself determines which Scheme data type we are reading.

When its static make() pseudo-constructor passes a reference to a terminal for input and to an environment for interpretation, it will return a (properly subclassed) reference to whatever object type it finds waiting for it on the input stream (see Listing 3). SchemeObject asks the lexical analyzer for the next token. It looks at the type of token it receives to see which of its subclasses to instantiate. This can be done recursively so that when we find a composite object like a list on the stream, the object returned will have constructed the elements of the composite and will be holding references to them.

Abstract Factory posits that clients will deal only with the abstract interface provided by the factory class, and not call subclass-specific methods. The SchemeObject is this abstract interface in HotScheme, and is the base class for all concrete Scheme data objects. This is important, because many Scheme functions, such as predicates like list? and number?, can operate on any type of Scheme object. Also, Scheme lists and vectors are heterogeneous collections, and require a common base type to hold references to. Because of this level of abstraction, clients don't need to know about new Scheme data types as we add them to the system.

Having SchemeObject as an abstract base class also helped when it came to error handling. Because Scheme is not strongly typed, any function might pass any data type for any of its arguments. However, this doesn't mean that every function can handle any data type! Many combinations should produce a runtime error. For example, the Scheme command first makes sense only when its first argument is a list. It's meaningless to ask for the "first element" of an integer. HotScheme handles this by throwing exceptions in the base SchemeObject class for most methods. A call to SchemeObject.first(), by default, throws an exception that states that the object the method was called on is not a list. We overrode that method only in the List class. This eliminates the need to scatter type-checking code throughout the system - in this case a big gain in both simplicity and performance.

Command: Scheme Built-in Commands as "Functors"
Often, commands in an interpreter are stored in a jump table that associates function names in the source code with function addresses, or jump points. When a symbol matches a name in the table, the interpreter "jumps" to that address to execute the function there. However, when we went to implement the built-in functions, we found that there was no straightforward way to create a jump table in Java. This is because there are no global functions, and no way to get a pointer to a member function. We wound up wrapping each function in what James Coplien terms a functor - an example of the Command pattern from DP. Each built-in command has its own class. To execute the command, we call the class' Apply() method, passing it its arguments - wrapped in a Scheme list - and an environment in which interpretation will take place. (Conveniently, this is also how we execute a user-defined function.) We instantiate one object for each built-in command when we initialize the interpreter, and store this instance in the symbol table as the value associated with the symbolic name of the command. Thus the object representing the function is first stored in a hash table associated with the key "first." The code (first '(a b c)) will cause the interpreter to look up that object and call its Apply() method, passing it the list (a b c). The Apply() method returns a SchemeObject, in this case a. At the point of execution the interpreter neither knows nor cares whether it is executing a built-in or user-defined function - that knowledge is stored in the function object itself. The drawback of our solution is that it has led to a large number of classes for built-in commands. See Listing 4 for an example of one of these functors.

FaÇade: Our Terminal Interface
The FaÇade pattern hides a number of complex interfaces behind a simpler, higher-level interface. We employed it to hide I/O details within the class LispTerminal. Our terminal interface is minimal, with little coupling between the UI and the interpreter.

In addition to our GUI version we've implemented a version for a Java character terminal, and it would be trivial to make a version that, for instance, interpreted code coming in on a socket. Instead of having the interpreter attempt to deal directly with character terminal, AWT, Swing, socket, Accessibility and other interfaces, the LispTerminal class presents a few abstract operations - like reading and printing - that the interpreter needs. The interpreter is passed an object that is a descendant of LispTerminal, to which it will direct I/O requests. Thus it is the creator of the interpreter instance, not the interpreter itself, that decides how I/O will be performed. For our character terminal version, input is read straight from the terminal. GUILispTerminal buffers keyboard input from HotScheme's input field and sends it all to the interpreter once the "Evaluate" button is clicked. To perform output, the interpreter calls the Print() method of the terminal passed to it, and the different terminal types output the text correctly. LispTerminal also provides a pushback buffer when the tokenizer has to read "too far" to tell when it has completely captured a token. (For example, a tokenizer can't tell that it's done reading a number until it reads the first character that is not a digit - one too many! The tokenizer needs to "push back" this extra character onto the input stream so that the next call for a token will read it.) Since lower-level I/O objects may not provide this capability, the Faade abstraction again simplifies our interfaces. See Listing 5 for the definition of LispTerminal.

How Did Patterns Help?
Employing patterns lent shape and coherence to our high-level design. Without being able to think about the interpreter using these patterns, the complexity of its design would have expanded beyond our grasp. The patterns gave us a way to think of the interpreter as a number of very high-level constructs, the details of which we could ignore when considering the interpreter as a whole.

The use of patterns is also crucial in communicating the design. In our discussions it was immensely helpful to be able to give names to the ideas shaping our work. To say "We'll employ an Abstract Factory to create Scheme types" captured a large piece of design in a simple, succinct statement.

Finally, by densely combining patterns in a small design space, we began to glimpse the poetic quality that Christopher Alexander, in A Pattern Language (Oxford University Press 1977), asserts this "compression" of patterns can produce. As we wove these patterns into our design, the program began to surprise even its authors in the way new features effortlessly emerged from the structures we had already created. And this sense of adventure and elegance is what can make our profession a fulfilling one to pursue.

We welcome communication from anyone interested in contributing to this project, and from any computer science departments or other educators who would like to deploy HotScheme at their institution.

Resources - URLs
Patterns Home Page: HYPERLINK
Christopher Alexander: An Introduction for Object-Oriented Designers: HYPERLINK

More Stories By Gene Callahan

Gene Callahan, president of St. George Technologies, designs and implements Internet projects. He has written articles for several national and international industry publications.
Rob Dodson is a software developer who writes options-trading software in Java and C++ for OTA Limited Partnership.
Previous projects include weather analysis software, tactical programs for Navy submarines, and code for electronic shelf labels.

More Stories By Brian Clark

Brian Clark is a software engineer residing in Virginia. His current focus is on the application of design patterns on UI and middle-tier design using Java.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.