A few days ago my companion Hendy Irawan shared with me his thoughts about accelerating the creation of code generators.
The common way of writing a code generator (e.g. based on Xtext) is
- Writing a prototype of (parts of) your program.
- Identifying the parts of your code to be generated. This leads to the domain, the input for which the generator (function) is defined. Gathering the domain demands some abstraction in most cases.
- Creating your domain specific language (dsl), a formal (= computer readable) language which semantically contains the essence of the code to be generated.
- Creating generator templates (based on your prototype), which are (more or less) strings with variable parts. The generator interprets an input (= model) which is formulated with the dsl (= meta model) in the way, that variable template parts are substituted with properties of the input.
- Writing code (models) based on the dsl and feed your generator with it. The result should look like your prototype.
Refining a code generator means repeating the steps above and typically includes
- Enhancing the meta model / dsl.
- Adjusting the models and the generator templates.
The generator has to be run to verify the result.
There is one special case regarding the refinement of a generator: not touching the dsl and the models. The generated code is only as good as the prototype used to derive the generator templates. Therefor one common use case is refining the prototype and stay with the dsl and models. But stepping through the round-trip mentioned above is kind of cumbersome because it takes a significant amount of time compared to just modify the generated code.
Our question is if it is possible to write a ‘reverse generator’ which merges modified generated code back into existing generator templates. The benefit would be a fast round-trip between generator and generated code.
We can think of (default) generators as projections in a geometric sense. As a (stupid but simple) example, code generation is like a linear transformation of a 2D object (model + templates) to a 1D object (generated code), leaving off one dimension (the knowledge about the model). In other words: the code generator has more detailed knowledge (= greater dimension) about the domain (in a meta sense) than the generated code.
A reverse generator must have the ability to merge generated code back into the templates. Theoretically, the simplest way to achieve this is by augmenting the generated code with extra information about its origin (within a generator template). Then the following conditions have to be satisfied:
- Informations derived from the model are fixed and should not be touched.
- Protected regions should not be touched.
- The remaining generated code has to be sliced and merged back into the template methods where it belongs to.
The first two points are trivial. The last point is the interesting case because one template method is potentially called n times. Having n occurrences of similar code makes it hard to unambiguously map it back to its origin when one or more of the occurrences are modified. This case has to be further investigated…