Generative Programming Basics

Previous staging approaches either work directly with strings that represent concrete program syntax or make use of quasiquoting to compose abstract syntax trees. We examine both approaches in turn, with an eye on how linguistic reuse improves productivity and safety for the multi-stage programmer.

Outline:

Program Generation with Strings

As a simple example, let us turn the power function:

def power(b: Double, n: Int): Double =
  if (n == 0) 1.0 else b * power(b, n - 1)

into a code generator:

def power(b: String, n: Int): String =
  if (n == 0) "1.0" else "(" + b + " * " + power(b, n - 1) + ")"

As result of an invocation we obtain:

power("x",4) // "(x * (x * (x * (x * 1.0)))"

However there is a problem: We can produce arbitrary strings that might not be valid code. It is very easy to make subtle mistakes:

def power(b: String, n: Int): String =
  if (n == 0) "1.0" else "b * " + power(b, n - 1) + ")"

We have accidentally omitted a parenthesis, so the result is not syntactically well formed code. Furthermore, the literal identifier b is part of the output:

power("x",4) // "b * b * b * b * 1.0)))"

This code will not compile and even if we fix the syntax, the code is no longer well scoped. The free identifier b can lead to variable capture when the code is spliced in somewhere else.

We have seen two problems, syntax correctness and scope correctness. Two other problems are type correctness and value correctness. If we cannot guarantee to generate valid programs, we can much less guarantee that programs are well-typed or compute correct results.

Program Generation with Quasi-Quotes

Strings model concrete syntax, but we can also use abstract syntax. This idea is inspired by Lisp's “code as data” model. We start with a slightly more convenient string notation, denoted by s"..." quotes:

def power(b: String, n: Int): String =
  if (n == 0) s"1.0" else s"($b * ${ power(b, n - 1) })"

The notation ${ ... } denotes a hole in the string, to be filled by the string result of evaluating the enclosed expression.

The same idea applies to abstract syntax. Let [[ ... ]] denote the AST of the enclosed expression, and let Tree be the type of AST nodes. Holes will require an expression of type Tree:

def power(b: Tree, n: Int): Tree =
  if (n == 0) [[ 1.0 ]] else [[$b * ${ power(b, n - 1) } ]]"

Now we have a program generator that assembles AST nodes.

Syntactic Correctness through Deep Reuse of Syntax

The multi-stage language compiler parses the whole program and builds ASTs for all expressions, quoted or not, at once. Thus we obtain syntax correctness. However the multi-stage language compiler must know about the syntax of the object language. This is trivial if meta-language and object language are the same. Otherwise it is slightly more difficult (*).

The Tree type can be left abstract. Some implementations hide the exact data structures to guarantee safety of optimizations on object code. Silently modifying trees with rewrites that maintain semantic but not structural equality (e.g. beta reduction) can change the behavior of programs that inspect the tree structure (*). In general, optimizations should not change the result of a program.

Scope Correctness through Deep Reuse of Scope

The multi-stage compiler can bind identifiers at the definition site of the quote. This avoids variable capture and ensures scope correctness (hygiene).

Another possible issue is scope extrusion. This happens when a variable bound in quoted code escapes through a hole:

var x: Tree;
$[[$ val y = 7; $\$${ x = y }$]]$

Scope extrusion can be prevented by appropriate type systems (*), which are beyond the scope of this thesis. Scope extrusion is a real problem for code generators that imperatively manage staged program fragments. For generators expressed in a functional style it is far less of an issue, regardless of whether the object program uses effects or not.

Type Correctness through Deep Reuse of Types

With syntax and scoping out of the way, we turn our attention to type correctness. Fortunately, type correctness falls out naturally if parametric types are available. We just replace type Tree with Tree[T]:

def power(b: Tree[Double], n: Int): Tree[Double] =

Now the type system ensures that power is only applied to AST nodes that compute Double values in the next stage.

Note that the use of parametric types alone does not prevent scope extrusion, which can also be seen as a type error in the sense of a well-typed multi- stage program “going wrong” (*). Thus we do not obtain a guarantee that all generated programs type check, but the slightly weaker assurance that all generated programs that are well-formed are also type correct.

Value Correctness is an Open Problem

The remaining big problem is what we (somewhat vaguely) call value correctness or more generally preservation of program semantics: How can we be reasonably certain that a program computes the same result after adding staging annotations? We cannot expect a strong guarantee in all cases for reasons of nontermination but what is troubling is that there are many practical cases where staging annotations change a program's behavior quite drastically. This fact is well documented in the literature (*).

The problem manifests itself both with strings and with trees. The root cause is that both approaches are based on syntactic expansion, irrespective of semantics such as order of execution.

Using the regular, unstaged power implementation:

def power(b: Double, n: Int): Double = ...

val x = computeA()       // computeA executed here
power(computeB() + x, 4) // computeB executed before calling power (cbv)

Both compute functions will be executed once, in order. Afterwards, power will be applied to the result.

Let us compare this with the staged implementation:

def power(b: Tree[Double], n: Int): Tree[Double] = ...

val x = $[[$computeA()$]]$
power($[[$computeB() + $\$$x$]]$, 4)

Result:

((computeB() + computeA()) *
 ((computeB() + computeA()) *
  ((computeB() + computeA()) *
   ((computeB() + computeA()) * 1.0))))"

In this case, the computation has been duplicated $n$ times and the order of the function calls has been reversed. Effectively we ignore all bindings and follow a call-by-name policy even though power declares its arguments as call- by-value. If either of the compute functions depends on side effects the staged function computation will produce a very different result. Imagine for example:

def computeA() = readNextInputValue()

We clearly want to read only one value, not four.

Even if both functions are pure, it will be much more expensive to compute the result. If we applied staging to obtaining better performance we have not achieved our goal.

As another example, let us switch to a better algorithm:

def power(b: Tree[Double], n: Int): Tree[Double] =
  if (n == 0) $[[$ 1.0 $]]$
  else if ((n&1) == 0) { val y = power(b, n/2); $[[ \$y$ * $\$y ]]$ }
  else $[[ \$b$ * $\$\{$ power(b, n - 1) $\} ]]$

Result:

power($[[$x$]]$) // (((x*1.0)*(x*1.0))*((x*1.0)*(x*1.0)))

Staging has turned the more efficient algorithm into a less efficient one. This effect of staging undoing binding and memoization is widely known (*).

Let Insertion as a Remedy

One way of fixing the order of staged expressions is to insert let-bindings in strategic places. This is frequently done by separate front ends. Staging effectively becomes an “assembly language” for code generation. The front end can assemble pieces of generated code using explicit side effects, or the code generators are written in monadic style or continuation passing style (CPS), in which case the monadic bind operation will insert let-bindings to maintain the desired evaluation order (*). Effectful code generators are much more likely to cause scope extrusion. Explicit monadic style or CPS complicate code generators a lot. This dilemma is described as an “agonizing trade-off”, due to which one “cannot achieve clarity, safety, and efficiency at the same time” (*). Only very recently have type-systems been devised to handle both staging and effects (*). They are not excessively restrictive but not without restrictions either. Mint (*), a multi-stage extension of Java, restricts non-local operations within escapes to final classes which excludes much of the standard Java library. Languages that support both staging and first class delimited continuations can mitigate this overhead but front ends that encapsulate the staging primitives are still needed (*).

In the partial evaluation community, specialization of effectful programs has been achieved by inserting let-bindings eagerly for each effectful statement (*), achieving on-the-fly conversion to administrative normal form (ANF, (*). As we will show below, a simplified variant of this approach naturally extends to staging with and without quasiquotes.

The LMS Way

We first show how to maintain value correctness through deep reuse of evaluation order. The key idea is similar to that employed in partial evaluation (*) and applies to both quasiquoting and LMS. Our presentation differs from the partial evaluation literature in that it is independent of any partial evaluation mechanics such as CPS conversion and expressed in a simple, purely operational way. Continuing with quasiquoting, we show how we can remove syntactic overhead and arrive at a more restricted object language by providing a typed API over staged Rep[T] values that hides the internal implementation. At this point, quasiquoting becomes an implementation detail that is no longer strictly needed because the higher level object language interface has taken over most of the staging guarantees. Staging can be implemented as a library, without specific language support. Linguistic reuse is enabled by lifting operations from type T to Rep[T]. The object language can be divided into reusable components. Since there is only a single shared Rep[T] type, no layerings or translations between components are necessary. Deep reuse of type inference enables a form of semi-automatic local BTA since method overloading will select either staged or unstaged operations depending on the types. In many cases, methods can be staged by just changing their parameter types.

Value Correctness through Deep Reuse of Evaluation Order

The key idea is to treat quoted fragments as context-sensitive statements, not context-free expressions. This means that we will need to explicitly perform a statement. We continue the description with strings as the representation type since it is the most basic. Performing a statement will register the side effects of this statement in the current context. The counterpart to perform is accumulate, which defines such a context and returns a program fragment that captures all the effects within the context. To make sure that all code fragments are treated in this way we introduce the following typings:

type Code
def perform(stm: String): Code
def accumulate(res: => Code): String

Note the by-name argument of accumulate. The Code type and the method implementations can remain abstract for the moment.

We can put perform and accumulate to use in the power example as follows:

def power(b: Code, n: Int): Code =
  if (n == 0) perform("1.0") else
  perform("(" + accumulate { b } + " * " + accumulate { power(b, n - 1) } + ")")

We define perform and accumulate in the following way to perform automatic eager let insertion. The private constructor code builds a Code object from a string:

accumulate { E[ perform("str") ] }
    $\longrightarrow$ "{ val fresh = str; " + accumulate { E[ code("fresh") ] } + "}"
accumulate { code("str") }
    $\longrightarrow$ "str"

Where E is an accumulate-free evaluation context and fresh a fresh identifier. These rules can be implemented using one piece of mutable state in a straightforward way.

We are reusing the execution order of the meta language: In the meta language we execute perform whenever we encounter an object program expression. If we force the object program to replay the order of the perform calls by inserting a let binding for each of them, we are sure to execute the performed statements in the right order. Whenever we have a hole to fill in an object program fragment, we use accumulate to gather all statements performed while computing the fragment to splice into the hole.

Perform and accumulate form a reflect/reify pair that translates between a syntactic and a semantic layer. Alternatively, perform could be called reflectEffects, accumulate reifyEffects. This hints at the view that we are embedding perform and accumulate in the (invisible) computation monad of the meta language using Filinski's notion of monadic reflection (*). Accumulate is a left inverse of perform with respect to extensional equality ($\equiv$) of the generated code:

accumulate { perform("a") } $\longrightarrow^{*}$ "{ val fresh = a; fresh }"       $\equiv$     "a"

If structural equality is desired, a simple special case can be added to the above definition to directly translate accumulate(perform("a")) to a. Within a suitable context, perform is also a left inverse of accumulate: Performing a set of accumulated statements together is the same as just performing the statements individually.

Clearly, using perform and accumulate manually is tedious. However we can incorporate them entirely inside the quasi quotation / interpolation syntax:

s" foo $\$${ bar } baz " $\longrightarrow$ " foo " + bar + " baz "  // regular interpolation
q" foo $\$${ bar } baz " $\longrightarrow$ perform(" foo " + accumulate { bar } + " baz ")

In essence, we identify quotation with perform and holes with accumulate.

We get back to a power implementation using quasiquotes. This time we use type Code, although we are still working with concrete syntax:

def power(b: Code, n: Int): Code =
  if (n == 0) q"1.0" else  q"($\$$b * $\$${ power(b, n - 1) } )"

The same mechanism can be used to implement order preserving versions of (type-safe) abstract syntax quotation $[[ … ]]$. The signatures will change from strings to trees:

type Code[T]
def perform(stm: Tree[T]): Code[T]
def accumulate(res: => Code[T]): Tree[T]

We put the modified quasiquotes to test by invoking power on the example from the previous Section:

def power(b: Code[T], n: Int): Code[T] = ...

val x = $[[$computeA()$]]$
power($[[$computeB() + $\$$x$]]$, 4)

We obtain as intermediate result before invoking power (dropping unnecessary braces and replacing semicolons with newlines):

val x1 = computeA()
val x3 = { val x2 = computeB() + x1; x2 }
power(x3,4)

And as the final result:

val x1 = computeA()
val x3 = { val x2 = computeB() + x1; x2 }
val x4 = 1.0
val x5 = x3 * x4
val x6 = x3 * x5
val x7 = x3 * x6
val x8 = x3 * x7
x8

It is easy to see that this is the correct sequencing of statements. No computation is duplicated. Likewise, if we use the improved algorithm, we actually get better performance.

We have removed the need for monadic or side-effecting front-ends (in this case, in other cases they may still be needed but never to perform let insertion). Since we have extended the core staging primitives with a controlled form of side effect, we have removed the need for uncontrolled side effects in the generator. This makes otherwise common errors such as scope extrusion much less likely.

Removing Syntactic Overhead

We have seen how we can improve staging based on quasiquotes or direct string generation. Now we turn to other approaches of delineating embedded object programs. Our aim is embedding domain specific compilers. We want object languages tailored to specific applications, with custom compiler components. The “one size fits all” approach of having the same meta and object language is not ideal for this purpose. In our case, we would have to inspect Scala ASTs and reject or possibly interpret constructs that have no correspondence in the object language (type, class or method definitions, etc).

The staged power implementations with quasi quotes look OK but they do contain a fair bit of syntactic noise. Also, we might want stronger guarantees about the staged code, for example that it does not use a particular language feature, which we know is detrimental to performance. What is more, we might want to generate code in a different language (JavaScript, CUDA, SQL).

We already hide the internal code representation from client programs. There are good reasons to also hide the full power of arbitrary program composition / quasi quoting from client programs.

Programs, such as power, use quasiquotes for two purposes: lifting primitives and operations:

def power(b: Code[Double], n: Int): Code[Double] =
  if (n == 0) q"1.0" else  q"($\$$b * $\$${ power(b, n - 1) } )"

We already identify object code via Code[T] types. Instead of quasiquotes we can employ other ways of lifting the necessary operations on type T to type Code[T]:

implicit def liftDouble(x: Double): Code[Double] = q"x"
def infix_*(x: Code[Double], y: Code[Double]): Code[Double] = q"$\$$x * $\$$y"

Now power can be implemented like this:

def power(b: Code[Double], n: Int): Code[Double] =
  if (n == 0) liftDouble(1.0) else infix_*(b, power(b, n - 1))

But we can simplify further. In fact, the Scala compiler will do most of the work for us and we can write just this:

def power(b: Code[Double], n: Int): Code[Double] =
  if (n == 0) 1.0 else b * power(b, n - 1)

Apart from the Code[_] types, we have re-engineered exactly the regular, unstaged power function! All other traces of staging annotations are gone.

We are relying on Scala's support for implicit conversions (views) and Scala- Virtualized support for infix methods. Other expressive languages provide similar features.

Staging as a Library and Modular Definition of Object Languages

With the object language definition given by method signatures we can implement staging as a library, without dedicated language support and with roughly the same guarantees as a multi-stage language with quasi quotation. Furthermore, we can easily generate code in another target language, for example emit JavaScript from staged Scala expressions. Given that the multi- stage program is in control of defining the object language we can model additional guarantees about the absence of certain operations from staged code, simply by not including these operations in the object language interface.

The core idea is to delegate correctness issues to the implementations of the lifted operations, i.e. the implementation of the object language interface. Client code can access staging only through the object language API, so if the implementation is correct, the interface ensures correctness of the client code.

We can use any representation we like for staged expressions. For the sake of simplicity we will stick to strings. Where we have used type Code[T] above, we will use Rep[T] from now on because we want to allude to thinking more about the representation of a T value in the next stage and less about composing code fragments.

Where quasiquoting allowed the full language to be staged, we now have to explicitly “white-list” all operations we want to make available. Clearly there is a tradeoff, as explicit white-listing of operations can be tedious. However we can remedy the white-listing effort to a large extent by providing libraries of reusable components that contain sets of lifted operations from which different flavors of object languages can be assembled. It is also possible to lift whole traits or classes using reflection (*).

We can define a simple object language MyStagedLanguage as follows, using private access qualifiers to ensure that the staging primitives perform and accumulate are inaccessible to client code outside of package internal:

package internal
trait Base extends EmbeddedControls {
  type Rep[T]
  private[internal] def perform[T](stm: String): Rep[T]
  private[internal] def accumulate[T](res: => Rep[T]): String
}
trait LiftPrimitives extends Base {
  implicit def liftDouble(x: Double): Rep[Double] = perform(x.toString)
}
trait Arith extends Base {
  def infix_*(x: Rep[Double], y: Rep[Double]): Rep[Double] = perform(x+"*"+y)
}
trait IfThenElse extends Base {
  def __ifThenElse[T](c: Rep[Boolean], a: =>Rep[T], a: =>Rep[T]): Rep[T] =
    perform("if (" + c + ") " + accumulate(a) + " else " + accumulate(b))
}
trait MyStagedLanguage extends LiftPrimitives with Arith with IfThenElse

Note that we invoke accumulate only for by-name parameters. All others are already object code values, so evaluating them is a no-op and cannot have side effects. In doing so we silently assume a sensible toString operation on Rep[T]. If we do not want to make this assumption then we need accumulate calls everywhere a Rep[T] value is converted to a string representation.

Client code just needs access to an object of type MyStagedLanguage to call methods on it. Common ways to achieve this include path-dependent types and imports:

val p: MyStagedLanguage = ...
import p._
def power(b: Rep[Double], n: Int): Rep[Double] = ...

In which case the desugared method signature is:

def power(b: p.Rep[Double], n: Int): p.Rep[Double] = ...

Or by structuring the client code as traits itself:

trait Power { this: MyStagedLanguage =>
  def power(b: Rep[Double], n: Int): Rep[Double] = ...
}

In the following we briefly revisit the various static guarantees and show how they are fulfilled in LMS.

Syntax correctness through Embedding as Methods

Generating syntactically well formed programs is delegated to methods implementing the object language interface. Client code never assembles pieces of code directly. If clients only use the API methods, and their implementations produce syntax correct code, overall syntax correctness follows.

Scope Correctness through Deep Reuse Of Val Bindings

The staging primitives perform eager let insertion and perform will assign a fresh identifier to each and every subexpression encountered, essentially producing on object program in administrative normal form (ANF). This removes the need for explicit val bindings in object code. Instead, programmers can just use val bindings in the meta program. This is an example of deep linguistic reuse, as the “feature” of val bindings is translated away.

As for scope correctness, we have not encountered any binders in object code so far. Below we will introduce staged functions using higher order abstract syntax (HOAS) (*):

def lambda[A,B](f: Rep[A] => Rep[B]): Rep[A=>B]
lambda { (x:Rep[Int]) => ... }  // a staged function object

The essence of HOAS is to reuse meta language bindings to implement object language bindings. Unless subverted by explicit scope extrusion, the reuse of meta language bindings ensures scope correctness of object programs.

Type Correctness through Typed Embedding (Deep Reuse of Types)

The object language API exposes only typed methods. If the implementations of these methods produce type correct code, then overall type correctness follows.

Value Correctness through Deep Reuse of Evaluation Order

The perform and accumulate abstraction has been described at length above.

Functions and Recursion

Many features can be added to the object language in a way that is analogous to what we have seen above but some require a bit more thought. In this section we will take a closer look at staged functions. Basic support for staged function definitions and function applications can be defined in terms of a simple higher-order abstract syntax (HOAS) (*) representation, similar to those of Carette et al. (*) and Hofer et al. (*).

The idea is to provide a lambda operation that transforms present-stage functions over staged values (type Rep[A] => Rep[B]) to staged function values (type Rep[A=>B]).

trait Functions extends Base {
  def lambda[A,B](f: Rep[A] => Rep[B]): Rep[A=>B]
  def infix_apply[A,B](f: Rep[A=>B], x: Rep[A]): Rep[B]
}

To give an example, the staged recursive factorial function will look like this:

def fac: Rep[Int => Int] = lambda { n =>
  if (n == 0) 1
  else n * fac(n - 1)
}

As opposed to the earlier power example, an invocation fac(m) will not inline the definition of fac but result in an actual function call in the generated code.

However the HOAS representation has the disadvantage of being opaque: there is no immediate way to “look into” a Scala function object. If we want to treat functions in the same way as other program constructs, we need a way to transform the HOAS encoding into our string representation. We can implement lambda(f) to call

accumulate { f(fresh[A]) }

which will unfold the function definition into a block that represents the entire computation defined by the function (assuming that fresh[A] creates a fresh symbol of type A). But eagerly expanding function definitions is problematic. For recursive functions, the result would be infinite, i.e. the computation will not terminate. What we would like to do instead is to detect recursion and generate a finite representation that makes the recursive call explicit. However this is difficult because recursion might be very indirect:

def foo(x: Rep[Int]) = {
  val f = (x: Rep[Int]) => foo(x + 1)
  val g = lambda(f)
  g(x)
}

Each incarnation of foo creates a new function f; unfolding will thus create unboundedly many different function objects.

To detect cycles, we have to compare those functions. This, of course, is undecidable in the general case of taking equality to be defined extensionally, i.e. saying that two functions are equal if they map equal inputs to equal outputs. The standard reference equality, by contrast, is too weak for our purpose:

def adder(x:Int) = (y: Int) => x + y
adder(3) == adder(3)
$\hookrightarrow$ false

However, we can approximate extensional equality by intensional (i.e. structural) equality, which is sufficient in most cases because recursion will cycle through a well defined code path in the program text. Testing intensional equality amounts to checking if two functions are defined at the same syntactic location in the source program and whether all data referenced by their free variables is equal. Fortunately, the implementation of first- class functions as closure objects offers (at least in principle) access to a “defunctionalized” data type representation on which equality can easily be checked. A bit of care must be taken though, because the structure can be cyclic. On the JVM there is a particularly neat trick. We can serialize the function objects into a byte array and compare the serialized representations:

serialize(adder(3)) == serialize(adder(3))
$\hookrightarrow$ true

With this method of testing equality, we can implement controlled unfolding. Unfolding functions only once at the definition site and associating a fresh symbol with the function being unfolded allows us to construct a block that contains a recursive call to the symbol we created. Thus, we can create the expected representation for the factorial function above.

Semi-Automatic BTA through Deep Reuse of Type Inference

Given a method or function implementation:

def power(b: _, n: Int) =
  if (n == 0) 1.0 else b * power(b, n - 1)

Scala's type inference can determine whether the operations and the result will be staged or not. We just have to provide the binding time for parameter b. Note that staging n would require explicit use of lambda because there is no static criterion to stop the recursion.

In some cases we need to be conservative, for example for mutable objects:

var i = 0
if (c)    // c: Rep[Boolean]
  i += 1

The variable i must be lifted because writes depend on dynamic control flow. We can accomplish this by implementing the virtualized var constructor to always lift variable declarations, even if the initial right-hand side is a static value. Packaged up in a trait, it can be selectively imported:

trait MyProg { this: LiftVariables =>
  ... // all variables are lifted in this scope
}

Generating and Loading Executable Code

Code generation in LMS is an explicit operation. For the common case where generated code is to be loaded immediately into the running program, trait Compile provides a suitable interface in form of the abstract method compile:

trait Compile extends Base {
  def compile[A,B](f: Rep[A] => Rep[B]): A=>B
}

The contract of compile is to “unstage” a function from staged to staged values into a function operating on present-stage values that can be used just like any other function object in the running program. Of course this only works for functions that do not reference externally bound Rep[T] values, otherwise the generate code will not compile due to free identifiers. The given encoding into Scala's type system does not prevent this kind of error.

For generating Scala code, an implementation of the compilation interface is provided by trait CompileScala:

trait CompileScala extends Compile {
  def compile[A,B](f: Rep[A] => Rep[B]) = {
    val x = fresh[A]
    val y = accumulate { f(x) }
    // emit header
    emitBlock(y)
    // emit footer
    // invoke compiler
    // load generated class file
    // instantiate object of that class
  }
}

The overall compilation logic of CompileScala is relatively simple: emit a class and apply-method declaration header, emit instructions for each definition node according to the schedule, close the source file, invoke the Scala compiler, load the generated class file and return a newly instantiated object of that class.