It uses lambda calculus as a target (core) language. The language grammar has only a few terms: constants, variables (symbols used as identifiers), lambda-expressions as procedures of one variable (lexically scoped) and applications. The source (macro) language is similar, but is a language for syntax trees. It adds macro expressions, distinguished from applications by the presence of a macro token as first component of the syntax tree. An expression of the form (m s) is not an application if m is a macro.
Terminology
In a macro form like (macro-token stree1... streeN), the trees stree1 ... streeN are called the syntactic scope of the syntactic extension.
A syntactic extension or lambda-term (target language) occurs in a syntax tree if it is a subtree that is not nested within the syntactic scope of another syntactic extension. "Since the expansion of a syntactic expression involves rearrangements of (parts of) syntax trees in its syntactic scope, we can only be sure about the interpretation of an expression when it is not embedded in a syntactic extension".
A syntactic transform function (STF) "is a macro function which is defined by the macro writer, and which expands a particular class of syntactic extensions" (e.g. and, or, cond).
"The result of applying a transform function to an occurence of a syntactic extension is called a transcription. A transcription step is the one-step expansion of a syntactic extension. Symbols which are introduced during a transcription step are referred to as generated symbols".
"The set of macros used during an expansion is a syntax table. Applied to an occurence of a syntactic extension it produces its transcription. It serves as a dispatcher and applies the appropriate transform function to its argument".
The expansion "of a syntax tree with respect to a syntax table is the tree in which all occurences of syntactic extensions are replaced by the expansions of their transcriptions. If the expansion process halts, the result of the expansion is a lambda-term, i.e. an expression of the core language."
Problems
Naive macro expansion captures free user identifiers. The transformation (or e1 e2) => (let v e1 (if v v e2)) does not work as intended for (or nil v).
"The example may suggest that one can simply rename all generated identifiers after each transcription step. But this impression is too simplistic.
Generated identifiers may act as free variables to be captured by the user-supplied program context. They must not be renamed. On the other hand, since macro expansion is possibly pyramided it may also not be quite obvious after a transcription step which identifier is to be free and which one is to be bound.
A final difficulty is that sometimes capturing is desired." (e.g. loop variables)
Hygienic Macro Expansion
"With a few exceptions, we do not want generated binding instances created by one transcription step to capture user-supplied variables or variables from some other transcription step."
Hygiene Condition for Macro Expansion
Generated identifiers that become binding instances in the completely expanded program must only bind variables that are generated at the same transcription step.
(HC/ME)
The hygiene condition can be achieved through a number of alpha-conversions. It cannot be done at each transcription step because "one cannot know in advance which macro-generated identifier will end up in a binding position" (different syntactic scopes).
The origin of identifiers is tracked by time-stamping identifiers with clock values for each step.
The algorithm
The algorithm has four steps:
- Transform the user supplied s-tree into a time-stamped s-tree. All identifiers leaves are stamped with 0. Clock value is set to 1.
- As with the naive algorithm, the expander parses though ts-tree expressions that are also lambda-terms. When it reaches a syntactic extension it generates the appropriate transcription. Before continuing, all generated identifiers are time-stamped with the current clock value, which is then increased.
The result is a time-stamped lambda-term. Wherever the naive result contained a variable, the modified result contains a time-stamped variable. E.g instead of (lambda x (lambda x ((f x) x))) we get (lambda x:0 (lambda (x:1 ((f:1 x:0) x:1))) - Replace all bound, timestamped identifiers by unstamped identifiers. Tokens with different timestamps came from different transcription steps and the difference must be preserved. This is the alpha-conversion step. (lambda a (lambda b ((f:1 a) b)))
- Parse the tree and remove all remaining timestamps (without changing the identifiers otherwise). The result is a pure lambda-term.
Conclusions
Exceptions to the HC/ME rule: "certain free identifiers in a parameter to a transform function are captured by generated binding instances". Only identifiers that survive as free identifiers until the input is completely expanded should be captured, and only identifiers with time-stamp 0 (user supplied ones). If we allow capturing of other timestamps, there could be interaction between the various syntactic extensions.
There is no scoping for transform functions, they are all at the toplevel.
The paper also gives a scheme implementation for an expander, which seems to be using derived syntax itself (cond, and, let, etc). I will need to implement these special forms in such a way that those symbols can be re-bound, so I will do the same trick as I used for eval and apply (expose fake primitives with the same names and treat them as special forms if their actual binding is the primitive binding).
Lambda calculus is hard, let's go shopping.
Niciun comentariu:
Trimiteți un comentariu