Efficient Implementation of Smalltalk Block Returns

Preface
One of the more challenging aspects of a Smalltalk implementation is the efficient implementation of blocks and in particular blocks containing method returns.  In the original Smalltalk-80 virtual machine design blocks were not reentrant and hence not true closures.  In addition,  all blocks and method activation records were heap allocated. The major commercial Smalltalk implementations of the 1990’s all corrected these problems and developed various techniques for  efficiently implementing blocks. However, these design issues carried forward into the Smalltalk-80 derived Squeak Smalltalk implementation.  Over the years there were a number of false starts  as various individuals  individual contemplated “adding block closures” to Squeak.
I originally wrote the following as a post on the squeak-dev mailing list in response to one such proposal.  It covers a number of design and implementation issues relating to Smalltalk block but the most important thing that I wanted place into the public record was my technique for avoid stack reification for blocks containing method returns. In the version of the original post below I’ve corrected a few typos and made a few minor edits in order to make it a more self-contained document. The original post included “part 1” in its title.  I never wrote “part 2” but if I had it would have talked about things like “up-level addressing” and additional ways to minimize what  state must be captured in heap allocated environments.
Allen Wirfs-Brock

May 2011
Comments on Smalltalk block closure designs, part 1
April 2001
I want to step back and give you my “big picture” view of block closure issues. My experience is that there are several fairly orthogonal architectural issues which require solutions in order to successfully implement block closures. A subset of these issues are addressed in the “classic” Smalltalk-80 MethodContext/BlockContext design but the factoring of the design solutions to these issues in the Smalltalk-80 design is not necessary optimal for supporting block closures. For this reason, I have generally found it useful to start a design by “going back to basics” rather than focusing on how to extend MethodContext/BlockContext.
So what are the principal architectural elements required to support Smalltalk method evaluation and block closure semantics? Here’s my list and some relevant comments:
1) Activation Records — An activation record records the transient state of a “function” activation. In Smalltalk this means a method activation or a block evaluation. The transient state includes control information (return address, etc.), temporary variables and arguments, etc. In principle, the control information and the data portions of an activation record could be separated but in practice this usually isn’t necessary. There is little significant difference between the activation record requirements of a Smalltalk method and a Smalltalk block. It is quite possible to design an activation record format that will work for either.
In the classic Smalltalk-80 design, MethodContexts and BlockContexts both serve as activation records. MethodContexts are a relatively “pure” implementation of the activation record abstraction encapsulating control information, arguments, temporary variables. BlockContexts are hybrids that encapsulate the control data of an activation record but depend upon an associated method context for storing arguments and temporary variables. BlockContexts also serve as the “function” object that corresponds to an evaluable block. This leads to various problems in the classic design which I will discuss below.
Traditional block structured languages stack allocate activation records. This is primarily done for efficiency of allocation and deallocation and is possible because the semantics of such languages typically restrict variable accesses to following the stack like model. The classic Smalltalk-80 model uses heap allocated activation records (MethodContexts/BlockContexts). However, high performance interpreters and particularly JITs try to avoid creating heap allocated activation records and instead typically try to use fairly conventional stack allocated activation records. This is possible because most activation records are never directly referenced (via an oop) as an object. If a program actually needs to capture an object reference to an activation record (for example, via a thiscontext reference) then the activation record (and typically all preceding activation records) must be “reified” as heap allocated objects. This can be a very expensive operation and any design that requires frequent reification of the stack will probably not perform well. Fortunately, it is possible to have a Smalltalk activation/block closure architecture that NEVER needs to reify the stack for normal execution! (you’ll have to keep reading if you want to know the trick that enables this)
2) Environments — You can think of an “environment” as the data portion of an activation record. The associated method is evaluated in the context of an environment and multiple activations (e.g. recursions) of a given method may be simultaneously active and bound to different environments. The lisp community uses terminology (coming from the Lambda calculus??) that says that when a “function” is associated with a particular environment the function is “closed” with respect to that environment and that any free (alternatively open) variables in the function are bound to  corresponding entries in the environment. Hence the origin of the term “closure”. (A free variable is one whose definition is not part of the function in question.) A closure is a function, bound to an environment. In Smalltalk, the act of evaluating a list of statments enclosed in square brackets creates a closure by binding the code (in particularly any free variables) within the brackets (the “function”) to an environment (that of the enclosing block or method). An alternative terminology that is sometimes used would say that

the environment (and its variables) are “captured” by the closure.
[A terminology note…”closure” is a highly technical term that has little meaning to most programmers. As closure semantics were added to Smalltalk-80 the term block closure was introduced and made manifest. I think it was a mistake to force this terminology on Smalltalk programmers who aren’t computer scientists. I prefer to call square brackets enclosing some Smalltalk statements a “block constructor” and to use the term “block” (or “block object”) to identify the type of object that is created by evaluating a block constructor. I think this is simpler terminology and most closely matches how programmers think about the constructs. This is the terminology that was used in writing the ANSI Smalltalk specification.]
Because blocks are first class Smalltalk objects they can be stored in any field or variable including those that have a longer lifetime then the activation record that is associated with the environment that is bound to the block. This means that an environment that can be potentially captured using a block constructor can not be managed via stack allocation. The simplest way around this problem would be to allocate all arguments and temporary variables of every activation record in a heap allocated environment object. (Each activation record would have a field that points to its associated environment object) This, however, would have performance characteristics very similar to exclusively using heap allocated activation records. It is easy to observe that the majority of arguments and temporary variables used in a program can never be captured by a block constructor. Any variable which we (or a compiler) can prove is only referenced by its defining method or block (and hence are not referenced from an nested block constructors) can be safely stack allocated in the activation record. In practice, most method contain no block constructors (for this purpose we don’t consider in-lined blocks as true block constructors) and hence all their arguments and temporaries can be safely stack allocated. Typically, for those methods that do contain real block constructors, only a small subset of the arguments and variables are actually subject to capture and only those variables need to be placed in a heap allocated environment object. The number of environment allocated variables can be further reduced by paying attention to “read only variables”. Any variable that can never be assigned during the lifetime of a blocks, for example, arguments of enclosing methods and blocks need not be placed in the environment if instead a copy of the variable is placed in the closure when it is created (is this true of Squeak?? Are arguments read only? They should be for this reason.)
3)Closures (or preferably, “block objects”) — A Block object captures an association between some executable code and an environment. A block object can be evaluated much like a function. This results in the creation of an activation record for the evaluation. The classic Smalltalk-80 design used the same object (a BlockContext) for both the “closure” and the activation record. Because of this, Smalltalk-80 blocks were not reentrant (and hence could not be recursively invoked) as there was only a single “return address slot”, stack, etc. that gets over-written by each activation of the block. Blocks (closures) and activation records have distinct roles and need to be represented by distinct objects.
In its simplest form, a block object needs only two slots, a reference to the code and a reference to the captured environment. If the environment that is capture by a block is represented directly in an activation record (this is essentially what happens via the home context reference of a Smalltalk-80 block context) then the activation record (and usually the entire stack) has to be reified. The expense of stack reification is a strong incentive to explicitly represent environments that are subject to capture as heap objects. In addition to the environment and code slots, blocks that contain up arrow (method) returns may (depending upon specifics of the overall design) need a continuation slot. If the captured continuation slot isn’t in the block object then it probably needs to be in an environment object.
4) Continuations – You can think of a continuation as the portion of an activation record that contains the return address and associated state. It specifies where execution will “continue” when control exits the code associated with the activation record. All methods and most blocks continue execution, upon return, in the activation that directly called them. In this case, control flow follows a stack discipline and the continuation linkage between activation record can be modeled implicitly with an activation record stack. The exception to this rule in Smalltalk is the up arrow return from a block. An up arrow return uses the continuation of the method activation that created (perhaps indirectly via intermediate block activations) the block object associated with the current activation rather than the current (block) activation record’s continuation.
[Some languages, most notably Scheme support “first class continuations”. In such languages, continuations are “real objects” that can be directly manipulated. In particular, a Scheme continuation can be “invoked” multiple times. This essentially means that control can be redirected to return from a particular function activation multiple times, even if  that activation has already been returned from. This permits various novel control paradigms but significantly complicates the implementation of continuations. Smalltalk-80 does not have this characteristic. A Smalltalk method continuation can be used only once and most Smalltalk implementation do not allow the use of continuations that cross (Smalltalk) process boundaries. These restrictions allow for optimizations that would otherwise be difficult to achieve.]
As with a block’s environment record, the method continuation must be captured when the block object is created. If the captured continuation is represented as a reference to the method’s activation record, this will also cause reification of the activation record and the stack which we would like to avoid.
I’ve now covered enough to explain how Smalltalk blocks with full closure semantics can be implemented in a way that never forces reification of the activation record stack:

Allocate all arguments and temps that are not “captured” by blocks within stack allocated activation records.
All variables that are subject to capture by blocks are explicitly allocated in heap allocated environment objects.  Alternatively if  the variable is “read-only” it may be copied into the block object.
Only activations that have captured variables need heap allocates environment objects, except for the following rule:
The activation of a method that contain a block with up arrow returns always allocates a heap environment object. It may be empty, in the sense that it contains no variables, but it must be allocated.
Activation records reference (“point to”) environment objects but environment objects never point to activation records. Environment objects and block objects may reference other environment objects.
A block (a closure) object references its enclosing environment(s) (but never the enclosing activation record(s))
A block object that implements an up arrow return always captures a reference to the environment associated with its method activation. This is used as the continuation marker.
An up arrow return is implemented by searching up the activation stack for an activation record whose environment reference is the same as the continuation marker of the returning bock. The activation stack is unwound (ensure/ifCurtailed blocks must be executed) and trimmed back to that activation record and a return from that activation is forced. If no activation record with that environment is found we have a #cannotReturn: error situation.