899 lines
29 KiB
Plaintext
899 lines
29 KiB
Plaintext
IMPLEMENTATION OF THE FACTOR COMPILER
|
|
|
|
Compilation of Factor is a messy business, driven by heuristics and not
|
|
formal theory. The compiler is inherently limited -- some expressions
|
|
cannot be compiled by definition. The programmer must take care to
|
|
ensure that performance-critical sections of code are written such that
|
|
they can be compiled.
|
|
|
|
=== Introduction
|
|
|
|
==== The problem
|
|
|
|
The Factor interpreter introduces a lot of overhead:
|
|
|
|
- Execution of a quotation involves iteration down a linked list.
|
|
|
|
- Stack access is not as fast as local variables, since Java
|
|
bound-checks all array accesses.
|
|
|
|
- At the lowest level, everything is expressed as Java reflection calls
|
|
to the Factor and Java platform libraries. Java reflection is not as
|
|
fast as statically-compiled Java calls.
|
|
|
|
- Since Factor is dynamically-typed, intermediate values on the stack
|
|
are all stored as java.lang.Object types, so type checks and
|
|
possibly coercions must be done at each step of the computation.
|
|
|
|
==== The solution
|
|
|
|
The following optimizations naturally suggest themselves, and lead to
|
|
the implementation of the Factor compiler:
|
|
|
|
- Compiling Factor code down to Java platform bytecode.
|
|
|
|
- Using virtual machine local variables instead of an array stack to
|
|
store intermediate values.
|
|
|
|
- Statically compiling in Java calls where the class, method and
|
|
variable names are known ahead of time.
|
|
|
|
- Type inference and soft typing to eliminate unnecessary type checks.
|
|
(At the time of writing, this is in progress and is not documented in
|
|
this paper.)
|
|
|
|
=== Preliminaries: interpreter internals
|
|
|
|
A word object is essentially a property list. The one property we are
|
|
concerned with here is "def", which holds a FactorWordDefinition object.
|
|
|
|
The accessor word "worddef" pushes the "def" slot of a given word name
|
|
or word object:
|
|
|
|
0] "+" worddef .
|
|
#<factor.FactorCompoundDefinition: +>
|
|
|
|
Generally, the word definition is an opaque object, however there are
|
|
various ways to deconstruct it, which will not be convered here (see the
|
|
worddef>list word if you are interested).
|
|
|
|
When a word object is being executed, the eval() method of its
|
|
definition is invoked. The eval() method takes one parameter, which is
|
|
the FactorInterpreter instance. The interpreter instance provides access
|
|
to the stacks, global namespace, vocabularies, and so on.
|
|
|
|
(In this article, we will use the term "word" and "word definition"
|
|
somewhat interchangably; this does not cause any confusion. If a "word"
|
|
is mentioned where one would expect a definition, simply assume the
|
|
"def" slot of the word is being accessed.)
|
|
|
|
The class FactorWordDefinition is abstract; a number of subclasses
|
|
exist:
|
|
|
|
- FactorCompoundDefinition: a standard colon definition consisting of
|
|
a quotation; for example, : sq dup * ; is syntax for a compound
|
|
definition named "sq" with quotation [ dup * ].
|
|
|
|
Of course, its eval() method simply pushes the quotation on the
|
|
interpreter's callstack.
|
|
|
|
- FactorShuffleDefinition: a stack rearrangement word, whose syntax is
|
|
described in detail in parser.txt. For example,
|
|
~<< swap a b -- b a >>~ is syntax for a shuffle definition named
|
|
"swap" that exchanges the top two values on the data stack.
|
|
|
|
- FactorPrimitiveDefinition: primitive word definitions are written in
|
|
Java. Various concrete subclasses of this class in the
|
|
factor.primitives package provide implementations of eval().
|
|
|
|
When a word definition is compiled, the compiler dynamically generates a
|
|
new class, creates a new instance, and replaces the "def" slot of the
|
|
word in question with the instance of the compiled class.
|
|
|
|
So the compiler's primary job is to generate appropriate Java bytecode
|
|
for the eval() method.
|
|
|
|
=== Preliminaries: the specimen
|
|
|
|
Consider the following (naive) implementation of the Fibonacci sequence:
|
|
|
|
: fib ( n -- nth fibonacci number )
|
|
dup 1 <= [
|
|
drop 1
|
|
] [
|
|
pred dup fib swap pred fib +
|
|
] ifte ;
|
|
|
|
A quick overview of the words used here:
|
|
|
|
- dup: a shuffle word that duplicates the top of the stack.
|
|
|
|
- <=: compare the top two numbers on the stack.
|
|
|
|
- drop: remove the top of the stack.
|
|
|
|
- pred: decrement the top of the stack by one. Indeed, it is defined as
|
|
simply : pred 1 - ;.
|
|
|
|
- swap: exchange the top two stack elements.
|
|
|
|
- +: add the top two stack elements.
|
|
|
|
- ifte: execute one of two given quotations, depending on the condition
|
|
on the stack.
|
|
|
|
=== Java reflection
|
|
|
|
The biggest performance improvement comes from the transformation of
|
|
Java reflection calls into static bytecode.
|
|
|
|
Indeed, when the compiler was first written, the only type of word it
|
|
could compile were such simple expressions that interfaced with Java and
|
|
nothing else.
|
|
|
|
In the above definition of "fib", the three key words <= - and + (note
|
|
that - is not referenced directly, but rather is a factor of the word
|
|
pred). All three of these words are implemented as Java calls into the
|
|
Factor math library:
|
|
|
|
: <= ( a b -- boolean )
|
|
[
|
|
"java.lang.Number" "java.lang.Number"
|
|
] "factor.math.FactorMath" "lessEqual" jinvoke-static ;
|
|
|
|
: - ( a b -- a-b )
|
|
[
|
|
"java.lang.Number" "java.lang.Number"
|
|
] "factor.math.FactorMath" "subtract" jinvoke-static ;
|
|
|
|
: + ( a b -- a+b )
|
|
[
|
|
"java.lang.Number" "java.lang.Number"
|
|
] "factor.math.FactorMath" "add" jinvoke-static ;
|
|
|
|
During interpretation, the execution of one of these words involves a
|
|
lot of overhead. First, the argument list is transformed into a Java
|
|
Class[] array; then the Class object corresponding to the containing
|
|
class is looked up; then the appropriate Method object defined in this
|
|
class is looked up; then the method is invoked, by passing it an
|
|
Object[] array consisting of arguments from the stack.
|
|
|
|
As one might guess, this is horribly inefficient. Indeed, look at the
|
|
time taken to compute the 25th Fibonacci number using pure
|
|
interpretation (of course depending on your hardware, results might
|
|
vary):
|
|
|
|
0] [ 25 fib ] time
|
|
24538
|
|
|
|
One quickly notices that in fact, all the overhead from the reflection
|
|
API is unnecessary; the containing class, method name and argument types
|
|
are, after all, known ahead of time.
|
|
|
|
For instance, the word "<=" might be compiled into the following
|
|
pseudo-bytecode (the details are a bit more complex in reality; we'll
|
|
get to it later):
|
|
|
|
MOVE datastack[top - 2] to JVM stack // get operands in right order
|
|
CHECKCAST java/lang/Number
|
|
MOVE datastack[top - 1] to JVM stack
|
|
CHECKCAST java/lang/Number
|
|
DECREMENT datastack.top 2 // pop the operands
|
|
INVOKESTATIC // invoke the method
|
|
"factor/FactorMath"
|
|
"lessEqual"
|
|
"(Ljava/lang/Number;Ljava/lang/Number;)Ljava/lang/Number;"
|
|
MOVE JVM stack top to datastack // push return value
|
|
|
|
Notice that no dynamic class or method lookups are done, and no arrays
|
|
are constructed; in fact, a modern Java virtual machine with a native
|
|
code compiler should be able to transform an INVOKESTATIC into a simple
|
|
subroutine call.
|
|
|
|
So what how much overhead is eliminated in practice? It is easy to find
|
|
out:
|
|
|
|
5] [ + - <= ] [ compile ] each
|
|
1] [ 25 fib ] time
|
|
937
|
|
|
|
This is still quite slow -- however, already we've gained a 26x speed
|
|
improvement!
|
|
|
|
Words consisting entirely of literal parameters to Java primitives such
|
|
as jinvoke, jnew, jvar-get/set, or jvar-get/set-static are compiled in a
|
|
similar manner; there is nothing new there.
|
|
|
|
=== First attempt at compiling compound definitions
|
|
|
|
Now consider the problem of compiling a word that does not directly call
|
|
Java primitives, but instead calls other words, which are already been
|
|
compiled.
|
|
|
|
For instance, consider the following word (recall that (...) is a comment!):
|
|
|
|
: mag2 ( x y -- sqrt[x*x+y*y] )
|
|
swap dup * swap dup * + sqrt ;
|
|
|
|
Lets assume that 'swap', 'dup', '*' and '+' are defined as before, and
|
|
that 'sqrt' is an already-compiled word that calls into the math
|
|
library.
|
|
|
|
Assume that the pseudo-bytecode INVOKEWORD <word> invokes the "eval"
|
|
method of a FactorWordDefinition instance.
|
|
|
|
(In reality, it is a bit more complex:
|
|
|
|
GETFIELD ... some field that stores a FactorWordDefinition instance ...
|
|
ALOAD 0 // push interpreter parameter to eval() on the stack
|
|
INVOKEVIRTUAL
|
|
"factor/FactorWordDefinition"
|
|
"eval"
|
|
"(Lfactor/FactorInterpreter;)V"
|
|
|
|
However the above takes up more space and adds no extra information over
|
|
the INVOKE notation.)
|
|
|
|
Now, we have the tools necessary to try compiling "mag2" as follows:
|
|
|
|
INVOKEWORD swap
|
|
INVOKEWORD dup
|
|
INVOKEWORD *
|
|
INVOKEWORD swap
|
|
INVOKEWORD dup
|
|
INVOKEWORD *
|
|
INVOKEWORD +
|
|
INVOKEWORD sqrt
|
|
|
|
In other words, the words still shuffle values back and forth on the
|
|
interpreter data stack as before; however, instead of the interpreter
|
|
iterating down a word thread, compiled bytecode invokes words directly.
|
|
|
|
This might seem like the obvious approach; however, it turns out it
|
|
brings very little performance benefit over simply iterating down a
|
|
linked list representing a quotation!
|
|
|
|
What we would like to do is just eliminate use of the interpreter's
|
|
stack for intermediate values altogether, and just loading the inputs at
|
|
the beginning and storing them at the end.
|
|
|
|
=== Avoiding the interpreter stack
|
|
|
|
The JVM is a stack machine, however its semantics are so different that
|
|
a direct mapping of interpreter stack use to stack bytecode would not
|
|
be feasable:
|
|
|
|
- No arbitrary stack access is allowed in Java; only a few, fixed stack
|
|
bytecodes like POP, DUP, SWAP are provided.
|
|
|
|
- A Java function receives input parameters in local variables, not in
|
|
the JVM stack.
|
|
|
|
In fact, the second point suggests that it is a better idea is to use
|
|
JVM *local variables* for temporary storage in compiled definitions.
|
|
|
|
Since no indirect addressing of locals is permitted, stack positions
|
|
used in computations must be known ahead of time. This process is known
|
|
as "stack effect deduction", and is the key concept of the Factor
|
|
compiler.
|
|
|
|
=== Fundamental idea: eval/core split
|
|
|
|
Earlier, we showed pseudo-bytecode for the word <=, however it was noted
|
|
that the reality is a bit more complicated.
|
|
|
|
Recall that FactorWordDefinition.eval() takes an interpreter instance.
|
|
It is the responsibility of this method to marshall and unmarshall
|
|
values on the interpreter stack before and after the word performs any
|
|
computation on the values.
|
|
|
|
In actual fact, compiled word definitions have a second method named
|
|
core(). Instead of accessing the interpreter data stack directly, this
|
|
method takes inputs from formal parameters passed to the method, in the
|
|
natural stack order.
|
|
|
|
So, lets look at possible disassembly for the eval() and core() methods
|
|
of the word <=:
|
|
|
|
void eval(FactorInterpreter interp)
|
|
|
|
ALOAD 0 // push interpreter instance on JVM stack
|
|
MOVE datastack[top - 2] to JVM stack // get operands in right order
|
|
CHECKCAST java/lang/Number
|
|
MOVE datastack[top - 1] to JVM stack
|
|
CHECKCAST java/lang/Number
|
|
DECREMENT datastack.top 2 // pop the operands
|
|
INVOKESTATIC // invoke the method
|
|
... compiled definition class name ...
|
|
"core"
|
|
"(Lfactor/FactorInterpreter;Ljava/lang/Object;Ljava/lang/Object;)
|
|
Ljava/lang/Object;"
|
|
MOVE JVM stack top to datastack // push return value
|
|
|
|
Object core(FactorInterpreter interp, Object x, Object y)
|
|
|
|
ALOAD 0 // push formal parameters
|
|
ALOAD 1
|
|
ALOAD 2
|
|
INVOKESTATIC // invoke the actual method
|
|
"factor/FactorMath"
|
|
"lessEqual"
|
|
"(Ljava/lang/Number;Ljava/lang/Number;)Ljava/lang/Number;"
|
|
ARETURN // pass return value up to eval()
|
|
|
|
==== Using the JVM stack and locals for intermediates
|
|
|
|
At first glance it seems nothing was achieved with the eval/core split,
|
|
excepting an extra layer of overhead.
|
|
|
|
However, the new revalation here is that compiled word definitions can
|
|
call each other's core methods *directly*, passing in the parameters
|
|
through JVM local variables, without the interpreter data stack being
|
|
involved!
|
|
|
|
Instead of pseudo-bytecode, from now on we will consider a very
|
|
abstract, high level "register transfer language". The extra verbosity
|
|
of bytecode will only distract from the key ideas.
|
|
|
|
Tentatively, we would like to compile the word 'mag2' as follows:
|
|
|
|
r0 * r0 -> r0
|
|
r1 * r1 -> r1
|
|
r0 + r1 -> r0
|
|
sqrt r0 -> r0
|
|
return r0
|
|
|
|
However this looks very different from the original, RPN definition; in
|
|
particular, we have named values, and the stack operations are gone!
|
|
|
|
As it turns out, there is a automatic way to transform the stack program
|
|
'mag2' into the register transfer program above (the reverse is also
|
|
possible, but will not be discussed here).
|
|
|
|
==== Stack effect deduction
|
|
|
|
Consider the following quotation:
|
|
|
|
[ swap dup * swap dup * + sqrt ]
|
|
|
|
The transformation of the above stack code into register code consists
|
|
of two passes.
|
|
|
|
(A one-pass approach is also possible; however because of the design of
|
|
the assembler used by the compiler, an extra pass will be required
|
|
elsewhere if this transformation described here is single-pass).
|
|
|
|
The first pass is simply to determine the total number of input and
|
|
output parameters of the quotation (its "stack effect"). We proceed as
|
|
follows.
|
|
|
|
1. Create a 'simulated' datastack. It does not contain actual values,
|
|
but rather markers.
|
|
|
|
Set the input parameter count to zero.
|
|
|
|
2. Iterate through each element of the quotation, and act as follows:
|
|
|
|
- If the element is a literal, allocate a simulated stack entry.
|
|
|
|
- If the element is a word, ensure that the stack has at least as
|
|
many items as the word's input parameter count.
|
|
|
|
If the stack does not have enough items, increment the input
|
|
parameter count by the difference between the stack item count and
|
|
the word's expected input parameter count, and fill the stack with
|
|
the difference.
|
|
|
|
Decrement the stack pointer by the word's input parameter count.
|
|
|
|
Increment the stack pointer by the word's output parameter count,
|
|
filling the new entries with markers.
|
|
|
|
3. When the end of the quotation is reached, the output parameter count
|
|
is the number of items on the simulated stack. The input parameter
|
|
count is the value of the intermediate parameter created in step 1.
|
|
|
|
Note that this algorithm is recursive -- to determine the stack effect
|
|
of a word, the stack effects of all its factors must be known. For now,
|
|
assume the stack effects of words that use the Java primitives are
|
|
"trivially" known.
|
|
|
|
A brief walkthrough of the above algorithm for the quotation
|
|
[ swap dup * swap dup * + sqrt ]:
|
|
|
|
swap - the simulated stack is empty but swap expects two parameters,
|
|
so the input parameter count becomes 2.
|
|
|
|
two empty markers are pushed on the simulated stack:
|
|
# #
|
|
|
|
dup - requires one parameter, which is already present.
|
|
another empty marker is pushed on the simulated stack:
|
|
|
|
# # #
|
|
|
|
* - requires two parameters, and returns one parameter, so the
|
|
simulated stack is now:
|
|
|
|
# #
|
|
|
|
swap - requires and returns two parameters.
|
|
|
|
# #
|
|
|
|
dup - requires one, returns two parameters.
|
|
|
|
# # #
|
|
|
|
* - requires two, and returns one parameter.
|
|
|
|
# #
|
|
|
|
+ - requires two, and returns one parameter.
|
|
|
|
#
|
|
|
|
sqrt - requires one, and returns one parameter.
|
|
|
|
#
|
|
|
|
So the input parameter count is two, and the output parameter count is
|
|
one (since at the end of the quotation the simulated datastack contains
|
|
one item marker).
|
|
|
|
==== The dataflow algorithm
|
|
|
|
The second pass of the compiler algorithm relies on the stack effect
|
|
already being known. It consists of these steps:
|
|
|
|
1. Create a new simulated stack. For each input parameter, a new entry
|
|
is allocated. This time, entries are not blank markers, but rather
|
|
register numbers.
|
|
|
|
2. Iterate through each element of the quotation, and act as follows:
|
|
|
|
- If the element is a literal, allocate a simulated stack entry.
|
|
This time, allocation finds an unused register number by checking
|
|
each stack entry.
|
|
|
|
- If the element is a shuffle word, apply the shuffle to the
|
|
simulated stack *and do not emit any code!*
|
|
|
|
- If the element is another word, pop the appropriate number of
|
|
register numbers from the simulated stack, and emit assembly code
|
|
for invoking the word with parameters stored in these registers.
|
|
|
|
Decrement the simulated stack pointer by the word's input parameter
|
|
count.
|
|
|
|
Increment the simulated stack pointer by the word's output
|
|
parameter count, filling the new entries with newly-allocated
|
|
register numbers.
|
|
|
|
Emit assembly code for moving the return values of the word into
|
|
the newly allocated registers.
|
|
|
|
Voila! The 'simulated stack' is a compile time only notion, and the
|
|
resulting emitted code does not explicitly reference any stacks at all;
|
|
in fact, applying this algorithm to the following quotation:
|
|
|
|
[ swap dup * swap dup * + sqrt ]
|
|
|
|
Yields the following output:
|
|
|
|
r0 * r0 -> r0
|
|
r1 * r1 -> r1
|
|
r0 + r1 -> r0
|
|
sqrt r0 -> r0
|
|
return r0
|
|
|
|
==== Multiple return values
|
|
|
|
A minor implementation detail is multiple return values. Java does not
|
|
support them directly, but a Factor word can return any number of
|
|
values. This is implemented by temporarily using the interpreter data
|
|
stack to return multiple values. This is the only time the interpreter
|
|
data stack is used.
|
|
|
|
==== The call stack
|
|
|
|
Sometimes Factor code uses the call stack as an 'extra hand' for
|
|
temporary storage:
|
|
|
|
dup >r + r> *
|
|
|
|
The dataflow algorithm can be trivially generalized with two simulated
|
|
stacks; there is nothing more to be said about this.
|
|
|
|
=== Questioning assumptions
|
|
|
|
The dataflow compilation algorithm gives us another nice performance
|
|
improvement. However, the algorithm assumes that the stack effect of
|
|
each word is known a priori, or can be deduced using the algorithm.
|
|
|
|
The algorithm falls down when faced with the following more complicated
|
|
expressions:
|
|
|
|
- Combinators calling the 'call' and 'ifte' primitives
|
|
|
|
- Recursive words
|
|
|
|
So ironically, this algorithm is unsuitable for code where it would help
|
|
the most -- complex code with a lot of branching, and tight loops and
|
|
recursions.
|
|
|
|
=== Eliminating explicit 'call':
|
|
|
|
As described above, the dataflow algorithm would break when it
|
|
encountered the 'call' primitive:
|
|
|
|
[ 2 + ] 5 swap call
|
|
|
|
The 'call' primitive executes the quotation at the top of the stack. So
|
|
its stack effect depends on its input parameter!
|
|
|
|
The first problem we faced was compilation of Java reflection
|
|
primitives. A critical observation was that all the information to
|
|
compile them efficiently was 'already there' in the source.
|
|
|
|
Our intuitition tells us that in the above code, the occurrence of
|
|
'call' *always* receives the parameter of [ 2 + ]; so somehow, the
|
|
quotation can be transformed into the following, which we can already
|
|
compile:
|
|
|
|
[ 2 + ] 5 swap drop 2 +
|
|
^^^^^^^^
|
|
"immediate instantiation" of 'call'
|
|
|
|
Or indeed, once the unused literal [ 2 + ] is factored out, simply:
|
|
|
|
5 2 +
|
|
|
|
==== Generalizing the 'simulated stack'
|
|
|
|
It might seem surprising that such expressions can be easily compiled,
|
|
once the 'simulated stack' is generalized such that it can hold literal
|
|
values!
|
|
|
|
The only change that needs to be made, is that in both passes, when a
|
|
literal is encountered, it is pushed directly on the simulated stack.
|
|
|
|
Also, when the primitive 'call' is encountered, its stack effect is
|
|
assumed to be the stack effect of the literal quotation at the top of
|
|
the simulated stack.
|
|
|
|
(What if the top of the simulated stack is a register number? The word
|
|
cannot be compiled, since the stack effect can potentially be
|
|
arbitrary!)
|
|
|
|
Being able to compile 'call' whose parameters are literals from the
|
|
same word definition doesn't really add nothing new.
|
|
|
|
A real breakthrough would be compiling "combinators"; words that take
|
|
parameters that are themselves quotations.
|
|
|
|
As it turns out, combinators themselves are not compiled -- however,
|
|
specific *instances* of combinators in other word definitions are.
|
|
|
|
For example, we can rewrite our word 'mag2' as follows:
|
|
|
|
: mag2 ( x y -- sqrt[x*x+y*y] )
|
|
[ sq ] 2apply + sqrt ;
|
|
|
|
Where 2apply is defined as follows:
|
|
|
|
: 2apply ( x y [ code ] -- )
|
|
2dup 2>r nip call 2r> call ;
|
|
|
|
How can we compile this new, equivalent, form of 'mag2'?
|
|
|
|
==== Inline words
|
|
|
|
Normally, when the dataflow algorithm encounters a word as an element
|
|
of a quotation, a call to that word's core() method is emitted. However,
|
|
if the word is compiled 'immediately', its definition is substituted in.
|
|
|
|
Assume for a second that in the new form of 'mag2', the word '2apply' is
|
|
compiled inline (ignoring the specifics of how this decision is made).
|
|
In other words, it is as if 'mag2' was defined as follows:
|
|
|
|
: mag2 ( x y -- sqrt[x*x+y*y] )
|
|
[ sq ] 2dup 2>r nip call 2r> call + sqrt ;
|
|
|
|
However, we already have a way of compiling the above code; in fact it
|
|
is compiled into the equivalent of:
|
|
|
|
: mag2 ( x y -- sqrt[x*x+y*y] )
|
|
[ sq ] 2dup 2>r nip drop sq 2r> drop sq + sqrt ;
|
|
^^^^^^^ ^^^^^^^
|
|
immediate instantiation of 'call'
|
|
|
|
As an aside, recall that the stack words 2dup, 2>r, nip, drop, and 2r>
|
|
do not emit any code, and the 'drop' of the literal [ sq ] ensures that
|
|
it never makes it to the compiled definition. The end-result is that the
|
|
register-transfer code is identical to the earlier definition of 'mag2'
|
|
which did not involve 2apply:
|
|
|
|
r0 * r0 -> r0
|
|
r1 * r1 -> r1
|
|
r0 + r1 -> r0
|
|
sqrt r0 -> r0
|
|
return r0
|
|
|
|
So, how is the decision made to compile a word inline, or not? It is
|
|
quite simple. If the word has a deducable stack effect on the simulated
|
|
stack of the current compilation, but it does *not* have a deducable
|
|
stack effect on an empty simulated stack, it is compiled immediate.
|
|
|
|
For example, the following word has a deducable stack effect, regardless
|
|
of the values of any literals on the simulated stack:
|
|
|
|
: sq ( x -- x^2 )
|
|
dup * ;
|
|
|
|
So the word 'sq' is always compiled normally.
|
|
|
|
However, the '2apply' word we saw earlier does not have a deducable
|
|
stack effect unless there is a literal quotation at the top of the
|
|
simulated stack:
|
|
|
|
: 2apply ( x y [ code ] -- )
|
|
2dup 2>r nip call 2r> call ;
|
|
|
|
So it is compiled inline.
|
|
|
|
Sometimes it is desirable to have short non-combinator words inlined.
|
|
While this is not necessary (whereas non-inlined combinators do not
|
|
compile), it can increase performance, especially if the word returns
|
|
multiple values (and without inlining, the interpreter datastack will
|
|
need to be used).
|
|
|
|
To mark a word for inline compilation, use the word 'inline' like so:
|
|
|
|
: sq ( x -- x^2 )
|
|
dup * ; inline
|
|
|
|
The word 'inline' sets the inline slot of the most recently defined word
|
|
object.
|
|
|
|
(Indeed, to push a reference to the most recently defined word object,
|
|
use the word 'word').
|
|
|
|
=== Branching
|
|
|
|
The only branching primitive supported by factor is 'ifte'. The syntax
|
|
is as follows:
|
|
|
|
2 2 + 4 = ( condition that leaves boolean on the stack )
|
|
[
|
|
( code to execute if condition is true )
|
|
] [
|
|
( code to execute if condition is false )
|
|
] ifte
|
|
|
|
Note that the different components might be spread between words, and
|
|
affected by stack operations in transit. Due to the dataflow algorithm
|
|
and inlining, all useful cases can be handled correctly.
|
|
|
|
==== Not all branching forms have a deducable stack effect
|
|
|
|
The first observation we gain is that if the two branches leave the
|
|
stack in inconsistent states, then stack positions used by subsequent
|
|
code will depend on the outcome of the branch.
|
|
|
|
This practice is discouraged anyway -- it leads to hard-to-understand
|
|
code -- so it is not supported by the compiler. If you must do it, the
|
|
words will always run in the interpreter.
|
|
|
|
Attempting to compile or balance an expression with such a branch raises
|
|
an error:
|
|
|
|
9] : bad-ifte 3 = [ 1 2 3 ] [ 2 2 + ] ifte ;
|
|
10] word effect .
|
|
break called.
|
|
|
|
:r prints the callstack.
|
|
:j prints the Java stack.
|
|
:x returns to top level.
|
|
:s returns to top level, retaining the data stack.
|
|
:g continues execution (but expect another error).
|
|
|
|
ERROR: Stack effect of [ 1 2 3 ] ( java.lang.Object -- java.lang.Object
|
|
java.lang.Object java.lang.Object ) is inconsistent with [ 2 2 + ] (
|
|
java.lang.Object -- java.lang.Object )
|
|
Head is ( java.lang.Object -- )
|
|
Recursive state:
|
|
[ #<ifte,base=null,effect=( java.lang.Object -- boolean java.lang.Object
|
|
java.lang.Object ); null.null()> #<bad-ifte,base=null,effect=( -- );
|
|
null.null()> ]
|
|
|
|
==== Merging
|
|
|
|
Lets return to our register transfer language, and add a branching
|
|
notation:
|
|
|
|
- two-instruction sequence to branch to <label> if <register> is null
|
|
ALOAD <register>
|
|
IFNULL <label>
|
|
|
|
- unconditional goto to <label>
|
|
GOTO <label>
|
|
|
|
So a simple conditional
|
|
|
|
rot [
|
|
(true)
|
|
] [
|
|
(false)
|
|
] ifte
|
|
|
|
Will be compiled as follows, where the inputs are in registers 1, 2, 3
|
|
|
|
1 ALOAD 1
|
|
2 IFNULL 5
|
|
3 (true)
|
|
4 GOTO 6
|
|
5 (false)
|
|
6 RETURN
|
|
|
|
However the question arises, what becomes of the simulated stack after
|
|
the branches are done.
|
|
|
|
For example, consider this snippet:
|
|
|
|
random-int random-int random-boolean [
|
|
swap
|
|
] [
|
|
|
|
] ifte
|
|
|
|
The first three words followed by the branch itself are compiled like
|
|
so:
|
|
|
|
1 1 <- random-int
|
|
2 2 <- random-int
|
|
3 3 <- random-boolean
|
|
4 ALOAD 3
|
|
5 IFNULL 8
|
|
|
|
However, a problem arises because if the true branch is taken, the
|
|
simulated stack contains register 1 at the top, and register 2 below;
|
|
but if the false branch is taken, it is the opposite!
|
|
|
|
The solution is to "merge" the stacks at the end of each branch. So
|
|
the remainder of our code might be compiled as follows:
|
|
|
|
6 1 <-> 2 // new notation: exchange registers 1 and 2
|
|
7 GOTO 8
|
|
8 RETURN
|
|
|
|
=== Recursion
|
|
|
|
Consider our old friend 'fib':
|
|
|
|
: fib ( n -- nth fibonacci number )
|
|
dup 1 <= [
|
|
drop 1
|
|
] [
|
|
pred dup fib swap pred fib +
|
|
] ifte ;
|
|
|
|
Using the tools we have, we cannot deduce its stack effect yet, since
|
|
the false branch of the 'ifte' refers to the word 'fib' itself.
|
|
|
|
A critical observation is if the word is to complete, eventually, the
|
|
test will fail and 'drop 1' will be executed.
|
|
|
|
Note that this implies that when given a parameter of 0 or 1, the
|
|
stack effect of 'fib' is ( X -- X ).
|
|
|
|
==== What is the stack effect?
|
|
|
|
To see how to deduce the stack effect of the recursive case, it is
|
|
necessary to make a mental leap. Consider the case where the parameter
|
|
to fib is 2. The word recurses twice, and in each case, the parameter
|
|
to the recursive call is <= 1, so 'drop 1' is executed.
|
|
|
|
So when the parameter is 2, the stack effect is also ( X -- X )!
|
|
|
|
In fact it is not hard to usee that if the stack effect of 'fib' with
|
|
parameter n-1 and n-2 is ( X -- X ), then the stack effect of 'fib' with
|
|
parameter n is also ( X -- X ).
|
|
|
|
Therefore by induction, for any input, 'fib' has stack effect
|
|
( X -- X ).
|
|
|
|
Once the stack effect is known, it is easy enough to compile; just treat
|
|
the two recursive calls like calls to any other word with stack effect
|
|
( X -- X ).
|
|
|
|
==== Not all recursive forms have a deducable stack effect
|
|
|
|
Consider the following word:
|
|
|
|
: push ( list -- ... )
|
|
dup [
|
|
uncons push
|
|
] unless ;
|
|
|
|
If the top of the stack is null, the word returns. So the base case is (
|
|
X -- X ).
|
|
|
|
However if the top of the stack is a list of one element, the word has
|
|
stack effect ( X -- X X ), since 'uncons' has stack effect ( X -- X X )
|
|
and the base case is ( X -- X ).
|
|
|
|
If we proceed, we find that if the top of the stack is a list of two
|
|
elements, the stack effect of the word is ( X -- X X X ).
|
|
|
|
The stack positions used for intermediate values can no longer be
|
|
determined ahead of time.
|
|
|
|
A word whose stack effect depends on input is said to 'diverge'. Since
|
|
it is generally good practice to only write converging recursive words,
|
|
it is not a big loss that the compiler does not support them. Of course,
|
|
such words still work in the interpreter.
|
|
|
|
==== Auxiliary methods
|
|
|
|
So far, we can compile recursive words such as 'fib' and tail-recursive
|
|
words such as 'list?'. Now, lets try applying our techniques to a word
|
|
that calls a recursive combinator:
|
|
|
|
: reverse ( list -- list )
|
|
[ ] swap [ swons ] each ;
|
|
|
|
Recall that 'swons' creates a cons cell with stack effect
|
|
( cdr car -- [ car , cdr ] ) -- the opposite order of 'cons', which has stack effect ( car cdr -- [ car , cdr ] ).
|
|
|
|
The combinator 'each' is defined as follows:
|
|
|
|
: each ( [ list ] [ quotation ] -- )
|
|
over [
|
|
>r uncons r> tuck 2>r call 2r> each
|
|
] [
|
|
2drop
|
|
] ifte ;
|
|
|
|
If we apply our previous inling technique, however, the end result is
|
|
absurd, since the recursive call to 'each' remains:
|
|
|
|
: reverse ( list -- list )
|
|
f swap [ swons ] over [
|
|
>r uncons r> tuck 2>r call 2r> each
|
|
] [
|
|
2drop
|
|
] ifte ;
|
|
|
|
However, if the recursive call is changed to 'reverse', then the result
|
|
is also incorrect, since '[ ] swap' would be executed on each iteration.
|
|
|
|
The solution is to place instances of recursive combinators in an
|
|
'auxiliary method' in the same class as the definition being compiled.
|
|
|
|
So in fact, 'reverse' is compiled as three methods, eval(), core(), and
|
|
aux_each_0().
|
|
|
|
==== Wrapping up
|
|
|
|
There are two implementation details not covered here; they are not
|
|
really 'interesting' and best described by the source code anyway:
|
|
|
|
- tail-recursive words are compiled with a GOTO not a method invocation
|
|
at the end of the recursive case.
|
|
|
|
- some extra steps are needed to normalize the stack after recursive
|
|
calls, and when auxiliary methods are being generated.
|
|
|
|
=== Conclusion
|
|
|
|
Finally, lets see what kind of improvement we get over naive
|
|
interpretation when our old friend the 'fib' word is compiled using all
|
|
the techniques mentioned above:
|
|
|
|
3] "fib" compile
|
|
4] [ 25 fib ] time
|
|
123
|
|
|
|
That's right -- a 200x improvement over pure interpretation.
|