My Problem With Code Generation

Though I very often rely on code generation for some aspects of my projects, I’d like to decrease its usage. Here is why.

Let’s start with an example. Imagine that we write a Web application for a shop. We have an HTTP interface to manipulate the shop items. Here is how we can define the endpoints to create, read, update and delete an item, using the Play routing DSL:

POST    /items           myshop.Items.create
GET     /items/:id       myshop.Items.get(id)
PUT     /items/:id       myshop.Items.update(id)
DELETE  /items/:id       myshop.Items.delete(id)

This code uses a so called “external” DSL that generates the code that invokes the application code according to the incoming requests.

Interestingly, Play also supports an embedded DSL to define routes. Let’s see how it compares with the external DSL:

import play.api.routing.Router
import play.api.routing.sird._
import myshop.Items
val router =
  Router.from {
    case POST(p"/items")       => Items.create
    case GET(p"/items/$id")    => Items.get(id)
    case PUT(p"/items/$id")    => Items.update(id)
    case DELETE(p"/items/$id") => Items.delete(id)
  }

So, the external DSL is 4 lines of code vs 10 lines of Scala code using the embedded DSL. But note that the comparison is not fair, though, because the external DSL generates code that actually provides more than just handling incoming requests (it can also build URLs according to your endpoint definitions, on both client-side and server-side).

The benefit of using the external DSL is that the syntax is … specific to the purpose of defining HTTP endpoints. It is thus supposed to be more concise and free of syntactic noise.

However, code generation also has limitations.

Limits of code generation

In general, I can see two main issues with DSL generating code: integration and expressive power.

Integration

The code written in the routing DSL needs to be integrated with the rest of the project. This problem has two facets: (1) how to refer to the “outside code” from the DSL and (2) how to refer to the generated code from the outside code.

Referring to the outside code from the DSL

In my example, I can write, from the DSL, the fully qualified names of the values I want to refer to. E.g. myshop.Items.get refers to the get member of the myshop.Items value.

We can immediately see a first (minor) issue: we don’t have the same facilities to import names as we usually have in general purpose languages. Indeed, in the routing DSL there is no “import” directive, hence the repetition of the fully qualified name of the myshop.Items value.

But the real issue is that the value myshop.Items needs to be statically bound. What if we want to refer to a value that will be dynamically computed, at run-time? Such a situation is quite common, actually. For instance, the behaviour of the controller might depend on some information read from a Web service. In such a case, it would not be possible to define myshop.Items as a singleton object.

In general, external DSLs can not be easily scoped to a specific part of the outside code.

Note that in the case of the Play routing DSL, the following special case is supported: you can make myshop.Items a top-level class (possibly taking constructor parameters) and define, in the outside code, how to supply such a value. So, your controller has not to be a singleton object, but it must be a top-level class. This solution, though being limited, supports most use cases.

The embedded DSL approach does not suffer from this problem at all: routing definitions are first-class values and thus can be written at any place in the code.

// Get useful information from an external Web service
getInfoFromExternalWebService().map { info =>
  // Then, make a controller using this information
  val items = makeController(info)

  // Eventually define a router that uses the controller
  Router.from {
    case POST(p"/items")       => items.create
    case GET(p"/items/$id")    => items.get(id)
    case PUT(p"/items/$id")    => items.update(id)
    case DELETE(p"/items/$id") => items.delete(id)
  }
}

Referring to the generated code from the outside code

To refer to generated code, we need to know the names of the generated types and values. Usually, code generators follow some sort of naming conventions so that we have a systematic way to know which names have been generated.

The Play routing code generator produces either a singleton object or a class, named according to the routes file name. You can then refer to the generated code, from the outside code, by naming this object or class.

Obviously, in simple cases it works just well, but in the long run you often have to hack a bit, e.g. to find how to split your routes files in different packages (see here) or in different sbt modules (see here). But the most painful situation, when you ask the code generator to produce a class, is to find which constructor parameters you have to pass and in which order.

More generally, the shape of the code produced by the code generator is more or less fixed: you don’t have full control over the code generation process unless you rewrite the code generator (but that might defeat the purpose of having a code generator). For instance, even though you could define a type RequestHandler capturing the concept of handling requests:

trait RequestHandler {
  def handle: PartialFunction[HttpRequest, Handler]
}

The code generator is unable to generate something that implements this type.

This limitation does not exist with the embedded DSL:

val requestHandler: RequestHandler = new RequestHandler {
  val handle = {
    case POST(p"/items")       => items.create
    case GET(p"/items/$id")    => items.get(id)
    case PUT(p"/items/$id")    => items.update(id)
    case DELETE(p"/items/$id") => items.delete(id)
  }
}

As we have seen, relying on an external DSL is a doubled-edged sword: on one hand it provides a nice and concise syntax, on the other hand the integration with the rest of the code often leads to some frictions.

Expressive power

The more expressive power a language has, the more useful it is. Expressive power is the corner stone of mastering complexity.

When I write code, whatever the purpose, it turns out that I eventually end up needing at least name binding and, even better, means of generalization.

Name binding

Name binding allows developers to give a name to a part of a program. Not only it helps breaking down a complex program into simpler programs, it also gives each part a meaningful name and makes it possible to reuse parts several times.

In our example it would be interesting, for instance, to shorten the fully qualified name myshop.Items to just items. But this is not possible because the routing DSL does not support name binding.

This is something that we can naturally achieve with the embedded DSL, though:

val items = myshop.Items

Means of generalization

First, let’s motivate my point with an example.

Suppose that we want to define HTTP endpoints to deal with suppliers in addition to the already defined HTTP endpoints for items manipulation. Our routes file would look like the following:

POST    /items               myshop.Items.create
GET     /items/:id           myshop.Items.get(id)
PUT     /items/:id           myshop.Items.update(id)
DELETE  /items/:id           myshop.Items.delete(id)

POST    /suppliers           myshop.Suppliers.create
GET     /suppliers/:id       myshop.Suppliers.get(id)
PUT     /suppliers/:id       myshop.Suppliers.update(id)
DELETE  /suppliers/:id       myshop.Suppliers.delete(id)

It turns out that the endpoints for manipulating suppliers are very similar to the ones for manipulating items: e.g. both the endpoints for creating an item and creating a supplier use the verb POST and have a request path with one segment, the endpoints for getting an item and for getting a supplier both use the verb GET and have a request path with two segments: the name of the resource and its id, etc.

We can advantageously factor out these similarities by introducing a more general concept of resource. Both items and suppliers are resources. By thinking in terms of resources instead of thinkng in terms of the four endpoints we simplify our cognitive load to reason about the project.

Being able to directly translate the more general concept of resources into code would reduce the gap between our thoughts and the code and thus would make it easier to reason about the code.

Unfortunately, generalization requires some sort of language support, and most external DSL lack of it.

By contrast, general purpose languages are specialized in supporting means of generalization. We can thus capture the concept of resource endpoints, in Scala, by using a method definition, as follows:

def resourceRouter(prefix: String, ctl: ResourceController): Router =
  Router.from {
    case POST(p"")       => ctl.create
    case GET(p"/$id")    => ctl.get(id)
    case PUT(p"/$id")    => ctl.update(id)
    case DELETE(p"/$id") => ctl.delete(id)
  }.withPrefix(prefix)

(For the sake of brevity, I assumed that we have somewhere else defined a type ResourceController that provides the four Actions corresponding to the four endpoints.)

We can then specialize our resourceRouter for items and suppliers, and define the whole router in terms of them:

val itemsRouter = resourceRouter("/items", items)
val suppliersRouter = resourceRouter("/suppliers", suppliers)

val router =
  Router.from(itemsRouter.routes orElse suppliersRouter.routes)

Conclusion

In this post I gave an overview of the kind of issues that makes me think twice before relying on code generation for a part of a program. First, the boundaries of the generated code are often an area of friction with the rest of the code. Second, there is rarely a DSL at the right level of abstraction for my project, since every project is different ; that’s why I want to rely on means of generalizations, usually only provided by general purpose languages, to adapt the language according to the level of abstraction required by the project.