bump version number, add docs to cvs

2004-08-04 03:44:43 +00:00 · 2004-08-04 03:44:43 +00:00 · 4d876fdc00
parent 1da7004915
commit 4d876fdc00
6 changed files with 5200 additions and 3 deletions
--- a/TODO.FACTOR.txt
+++ b/TODO.FACTOR.txt
@ -7,9 +7,8 @@ ERROR: I/O error: [ "primitive_read_line_fd_8" "Resource temporarily unavailable
 - decide if overflow is a fatal error
 - f >n: crashes
 - typecases: type error reporting bad
 - image output
 - floats
- {...} vectors
+- {...} vectors in java factor
 - parsing should be parsing
 - describe-word
 - clone-sbuf
--- a/doc/compiler-impl.txt
+++ b/doc/compiler-impl.txt
@ -0,0 +1,898 @@
 IMPLEMENTATION OF THE FACTOR COMPILER
 Compilation of Factor is a messy business, driven by heuristics and not
 formal theory. The compiler is inherently limited -- some expressions
 cannot be compiled by definition. The programmer must take care to
 ensure that performance-critical sections of code are written such that
 they can be compiled.
 === Introduction
 ==== The problem
 The Factor interpreter introduces a lot of overhead:
 - Execution of a quotation involves iteration down a linked list.
 - Stack access is not as fast as local variables, since Java
  bound-checks all array accesses.
 - At the lowest level, everything is expressed as Java reflection calls
  to the Factor and Java platform libraries. Java reflection is not as
  fast as statically-compiled Java calls.
 - Since Factor is dynamically-typed, intermediate values on the stack
  are all stored as java.lang.Object types, so type checks and
  possibly coercions must be done at each step of the computation.
 ==== The solution
 The following optimizations naturally suggest themselves, and lead to
 the implementation of the Factor compiler:
 - Compiling Factor code down to Java platform bytecode.
 - Using virtual machine local variables instead of an array stack to
  store intermediate values.
 - Statically compiling in Java calls where the class, method and
  variable names are known ahead of time.
 - Type inference and soft typing to eliminate unnecessary type checks.
  (At the time of writing, this is in progress and is not documented in
  this paper.)
 === Preliminaries: interpreter internals
 A word object is essentially a property list. The one property we are
 concerned with here is "def", which holds a FactorWordDefinition object.
 The accessor word "worddef" pushes the "def" slot of a given word name
 or word object:
    0] "+" worddef .
 #<factor.FactorCompoundDefinition: +>
 Generally, the word definition is an opaque object, however there are
 various ways to deconstruct it, which will not be convered here (see the
 worddef>list word if you are interested).
 When a word object is being executed, the eval() method of its
 definition is invoked. The eval() method takes one parameter, which is
 the FactorInterpreter instance. The interpreter instance provides access
 to the stacks, global namespace, vocabularies, and so on.
 (In this article, we will use the term "word" and "word definition"
 somewhat interchangably; this does not cause any confusion. If a "word"
 is mentioned where one would expect a definition, simply assume the
 "def" slot of the word is being accessed.)
 The class FactorWordDefinition is abstract; a number of subclasses
 exist:
 - FactorCompoundDefinition: a standard colon definition consisting of
  a quotation; for example, : sq dup * ; is syntax for a compound
  definition named "sq" with quotation [ dup * ].
  Of course, its eval() method simply pushes the quotation on the
  interpreter's callstack.
 - FactorShuffleDefinition: a stack rearrangement word, whose syntax is
  described in detail in parser.txt. For example,
  ~<< swap a b -- b a >>~ is syntax for a shuffle definition named
  "swap" that exchanges the top two values on the data stack.
 - FactorPrimitiveDefinition: primitive word definitions are written in
  Java. Various concrete subclasses of this class in the
  factor.primitives package provide implementations of eval().
 When a word definition is compiled, the compiler dynamically generates a
 new class, creates a new instance, and replaces the "def" slot of the
 word in question with the instance of the compiled class.
 So the compiler's primary job is to generate appropriate Java bytecode
 for the eval() method.
 === Preliminaries: the specimen
 Consider the following (naive) implementation of the Fibonacci sequence:
 : fib ( n -- nth fibonacci number )
    dup 1 <= [
        drop 1 
    ] [
        pred dup fib swap pred fib + 
    ] ifte ;
 A quick overview of the words used here:
 - dup: a shuffle word that duplicates the top of the stack.
 - <=: compare the top two numbers on the stack.
 - drop: remove the top of the stack.
 - pred: decrement the top of the stack by one. Indeed, it is defined as
  simply : pred 1 - ;.
 - swap: exchange the top two stack elements.
 - +: add the top two stack elements.
 - ifte: execute one of two given quotations, depending on the condition
  on the stack.
 === Java reflection
 The biggest performance improvement comes from the transformation of
 Java reflection calls into static bytecode.
 Indeed, when the compiler was first written, the only type of word it
 could compile were such simple expressions that interfaced with Java and
 nothing else.
 In the above definition of "fib", the three key words <= - and + (note
 that - is not referenced directly, but rather is a factor of the word
 pred). All three of these words are implemented as Java calls into the
 Factor math library:
 : <= ( a b -- boolean )
    [
        "java.lang.Number" "java.lang.Number" 
    ] "factor.math.FactorMath" "lessEqual" jinvoke-static ;
 : - ( a b -- a-b )
    [
        "java.lang.Number" "java.lang.Number" 
    ] "factor.math.FactorMath" "subtract" jinvoke-static ;
 : + ( a b -- a+b )
    [
        "java.lang.Number" "java.lang.Number" 
    ] "factor.math.FactorMath" "add" jinvoke-static ;
 During interpretation, the execution of one of these words involves a
 lot of overhead. First, the argument list is transformed into a Java
 Class[] array; then the Class object corresponding to the containing
 class is looked up; then the appropriate Method object defined in this
 class is looked up; then the method is invoked, by passing it an
 Object[] array consisting of arguments from the stack.
 As one might guess, this is horribly inefficient. Indeed, look at the
 time taken to compute the 25th Fibonacci number using pure
 interpretation (of course depending on your hardware, results might
 vary):
    0] [ 25 fib ] time
 24538
 One quickly notices that in fact, all the overhead from the reflection
 API is unnecessary; the containing class, method name and argument types
 are, after all, known ahead of time.
 For instance, the word "<=" might be compiled into the following
 pseudo-bytecode (the details are a bit more complex in reality; we'll
 get to it later):
 MOVE datastack[top - 2] to JVM stack // get operands in right order
 CHECKCAST java/lang/Number
 MOVE datastack[top - 1] to JVM stack
 CHECKCAST java/lang/Number
 DECREMENT datastack.top 2            // pop the operands
 INVOKESTATIC                         // invoke the method
 	"factor/FactorMath"
 	"lessEqual"
 	"(Ljava/lang/Number;Ljava/lang/Number;)Ljava/lang/Number;"
 MOVE JVM stack top to datastack      // push return value
 Notice that no dynamic class or method lookups are done, and no arrays
 are constructed; in fact, a modern Java virtual machine with a native
 code compiler should be able to transform an INVOKESTATIC into a simple
 subroutine call.
 So what how much overhead is eliminated in practice? It is easy to find
 out:
    5] [ + - <= ] [ compile ] each
    1] [ 25 fib ] time
 937
 This is still quite slow -- however, already we've gained a 26x speed
 improvement!
 Words consisting entirely of literal parameters to Java primitives such
 as jinvoke, jnew, jvar-get/set, or jvar-get/set-static are compiled in a
 similar manner; there is nothing new there.
 === First attempt at compiling compound definitions
 Now consider the problem of compiling a word that does not directly call
 Java primitives, but instead calls other words, which are already been
 compiled.
 For instance, consider the following word (recall that (...) is a comment!):
 : mag2 ( x y -- sqrt[x*x+y*y] )
    swap dup * swap dup * + sqrt ;
 Lets assume that 'swap', 'dup', '*' and '+' are defined as before, and
 that 'sqrt' is an already-compiled word that calls into the math
 library.
 Assume that the pseudo-bytecode INVOKEWORD <word> invokes the "eval"
 method of a FactorWordDefinition instance.
 (In reality, it is a bit more complex:
 GETFIELD ... some field that stores a FactorWordDefinition instance ...
 ALOAD 0 // push interpreter parameter to eval() on the stack
 INVOKEVIRTUAL
 	"factor/FactorWordDefinition"
 	"eval"
 	"(Lfactor/FactorInterpreter;)V"
 However the above takes up more space and adds no extra information over
 the INVOKE notation.)
 Now, we have the tools necessary to try compiling "mag2" as follows:
 INVOKEWORD swap
 INVOKEWORD dup
 INVOKEWORD *
 INVOKEWORD swap
 INVOKEWORD dup
 INVOKEWORD *
 INVOKEWORD +
 INVOKEWORD sqrt
 In other words, the words still shuffle values back and forth on the
 interpreter data stack as before; however, instead of the interpreter
 iterating down a word thread, compiled bytecode invokes words directly.
 This might seem like the obvious approach; however, it turns out it
 brings very little performance benefit over simply iterating down a
 linked list representing a quotation!
 What we would like to do is just eliminate use of the interpreter's
 stack for intermediate values altogether, and just loading the inputs at
 the beginning and storing them at the end.
 === Avoiding the interpreter stack
 The JVM is a stack machine, however its semantics are so different that
 a direct mapping of interpreter stack use to stack bytecode would not
 be feasable:
 - No arbitrary stack access is allowed in Java; only a few, fixed stack
  bytecodes like POP, DUP, SWAP are provided.
 - A Java function receives input parameters in local variables, not in
  the JVM stack.
 In fact, the second point suggests that it is a better idea is to use
 JVM *local variables* for temporary storage in compiled definitions.
 Since no indirect addressing of locals is permitted, stack positions
 used in computations must be known ahead of time. This process is known
 as "stack effect deduction", and is the key concept of the Factor
 compiler.
 === Fundamental idea: eval/core split
 Earlier, we showed pseudo-bytecode for the word <=, however it was noted
 that the reality is a bit more complicated.
 Recall that FactorWordDefinition.eval() takes an interpreter instance.
 It is the responsibility of this method to marshall and unmarshall
 values on the interpreter stack before and after the word performs any
 computation on the values.
 In actual fact, compiled word definitions have a second method named
 core(). Instead of accessing the interpreter data stack directly, this
 method takes inputs from formal parameters passed to the method, in the
 natural stack order.
 So, lets look at possible disassembly for the eval() and core() methods
 of the word <=:
 void eval(FactorInterpreter interp)
 ALOAD 0 // push interpreter instance on JVM stack
 MOVE datastack[top - 2] to JVM stack // get operands in right order
 CHECKCAST java/lang/Number
 MOVE datastack[top - 1] to JVM stack
 CHECKCAST java/lang/Number
 DECREMENT datastack.top 2            // pop the operands
 INVOKESTATIC                         // invoke the method
 	... compiled definition class name ...
 	"core"
 	"(Lfactor/FactorInterpreter;Ljava/lang/Object;Ljava/lang/Object;)
 	 Ljava/lang/Object;"
 MOVE JVM stack top to datastack      // push return value
 Object core(FactorInterpreter interp, Object x, Object y)
 ALOAD 0                              // push formal parameters
 ALOAD 1
 ALOAD 2
 INVOKESTATIC                         // invoke the actual method
 	"factor/FactorMath"
 	"lessEqual"
 	"(Ljava/lang/Number;Ljava/lang/Number;)Ljava/lang/Number;"
 ARETURN                              // pass return value up to eval()
 ==== Using the JVM stack and locals for intermediates
 At first glance it seems nothing was achieved with the eval/core split,
 excepting an extra layer of overhead.
 However, the new revalation here is that compiled word definitions can
 call each other's core methods *directly*, passing in the parameters
 through JVM local variables, without the interpreter data stack being
 involved!
 Instead of pseudo-bytecode, from now on we will consider a very
 abstract, high level "register transfer language". The extra verbosity
 of bytecode will only distract from the key ideas.
 Tentatively, we would like to compile the word 'mag2' as follows:
 r0 * r0 -> r0
 r1 * r1 -> r1
 r0 + r1 -> r0
 sqrt r0 -> r0
 return r0
 However this looks very different from the original, RPN definition; in
 particular, we have named values, and the stack operations are gone!
 As it turns out, there is a automatic way to transform the stack program
 'mag2' into the register transfer program above (the reverse is also
 possible, but will not be discussed here).
 ==== Stack effect deduction
 Consider the following quotation:
 [ swap dup * swap dup * + sqrt ]
 The transformation of the above stack code into register code consists
 of two passes.
 (A one-pass approach is also possible; however because of the design of
 the assembler used by the compiler, an extra pass will be required
 elsewhere if this transformation described here is single-pass).
 The first pass is simply to determine the total number of input and
 output parameters of the quotation (its "stack effect"). We proceed as
 follows.
 1. Create a 'simulated' datastack. It does not contain actual values,
   but rather markers.
   Set the input parameter count to zero.
 2. Iterate through each element of the quotation, and act as follows:
   - If the element is a literal, allocate a simulated stack entry.
   - If the element is a word, ensure that the stack has at least as
     many items as the word's input parameter count.
     If the stack does not have enough items, increment the input
     parameter count by the difference between the stack item count and
     the word's expected input parameter count, and fill the stack with
     the difference.
     Decrement the stack pointer by the word's input parameter count.
     Increment the stack pointer by the word's output parameter count,
     filling the new entries with markers.
 3. When the end of the quotation is reached, the output parameter count
   is the number of items on the simulated stack. The input parameter
   count is the value of the intermediate parameter created in step 1.
 Note that this algorithm is recursive -- to determine the stack effect
 of a word, the stack effects of all its factors must be known. For now,
 assume the stack effects of words that use the Java primitives are
 "trivially" known.
 A brief walkthrough of the above algorithm for the quotation
 [ swap dup * swap dup * + sqrt ]:
 swap - the simulated stack is empty but swap expects two parameters,
       so the input parameter count becomes 2.
       two empty markers are pushed on the simulated stack:
       # #
 dup  - requires one parameter, which is already present.
       another empty marker is pushed on the simulated stack:
       # # #
 *    - requires two parameters, and returns one parameter, so the
       simulated stack is now:
       # #
 swap - requires and returns two parameters.
       # #
 dup  - requires one, returns two parameters.
       # # #
 *    - requires two, and returns one parameter.
       # #
 +    - requires two, and returns one parameter.
       #
 sqrt - requires one, and returns one parameter.
       #
 So the input parameter count is two, and the output parameter count is
 one (since at the end of the quotation the simulated datastack contains
 one item marker).
 ==== The dataflow algorithm
 The second pass of the compiler algorithm relies on the stack effect
 already being known. It consists of these steps:
 1. Create a new simulated stack. For each input parameter, a new entry
   is allocated. This time, entries are not blank markers, but rather
   register numbers.
 2. Iterate through each element of the quotation, and act as follows:
   - If the element is a literal, allocate a simulated stack entry.
     This time, allocation finds an unused register number by checking
     each stack entry.
   - If the element is a shuffle word, apply the shuffle to the
     simulated stack *and do not emit any code!*
   - If the element is another word, pop the appropriate number of
     register numbers from the simulated stack, and emit assembly code
     for invoking the word with parameters stored in these registers.
     Decrement the simulated stack pointer by the word's input parameter
     count.
     Increment the simulated stack pointer by the word's output
     parameter count, filling the new entries with newly-allocated
     register numbers.
     Emit assembly code for moving the return values of the word into
     the newly allocated registers.
 Voila! The 'simulated stack' is a compile time only notion, and the
 resulting emitted code does not explicitly reference any stacks at all;
 in fact, applying this algorithm to the following quotation:
 [ swap dup * swap dup * + sqrt ]
 Yields the following output:
 r0 * r0 -> r0
 r1 * r1 -> r1
 r0 + r1 -> r0
 sqrt r0 -> r0
 return r0
 ==== Multiple return values
 A minor implementation detail is multiple return values. Java does not
 support them directly, but a Factor word can return any number of
 values. This is implemented by temporarily using the interpreter data
 stack to return multiple values. This is the only time the interpreter
 data stack is used.
 ==== The call stack
 Sometimes Factor code uses the call stack as an 'extra hand' for
 temporary storage:
 dup >r + r> *
 The dataflow algorithm can be trivially generalized with two simulated
 stacks; there is nothing more to be said about this.
 === Questioning assumptions
 The dataflow compilation algorithm gives us another nice performance
 improvement. However, the algorithm assumes that the stack effect of
 each word is known a priori, or can be deduced using the algorithm.
 The algorithm falls down when faced with the following more complicated 
 expressions:
 - Combinators calling the 'call' and 'ifte' primitives
 - Recursive words
 So ironically, this algorithm is unsuitable for code where it would help
 the most -- complex code with a lot of branching, and tight loops and
 recursions.
 === Eliminating explicit 'call':
 As described above, the dataflow algorithm would break when it
 encountered the 'call' primitive:
 [ 2 + ] 5 swap call
 The 'call' primitive executes the quotation at the top of the stack. So
 its stack effect depends on its input parameter!
 The first problem we faced was compilation of Java reflection
 primitives. A critical observation was that all the information to
 compile them efficiently was 'already there' in the source.
 Our intuitition tells us that in the above code, the occurrence of
 'call' *always* receives the parameter of [ 2 + ]; so somehow, the
 quotation can be transformed into the following, which we can already
 compile:
 [ 2 + ] 5 swap drop 2 +
               ^^^^^^^^
 	       "immediate instantiation" of 'call'
 Or indeed, once the unused literal [ 2 + ] is factored out, simply:
 5 2 +
 ==== Generalizing the 'simulated stack'
 It might seem surprising that such expressions can be easily compiled,
 once the 'simulated stack' is generalized such that it can hold literal
 values!
 The only change that needs to be made, is that in both passes, when a
 literal is encountered, it is pushed directly on the simulated stack.
 Also, when the primitive 'call' is encountered, its stack effect is
 assumed to be the stack effect of the literal quotation at the top of
 the simulated stack.
 (What if the top of the simulated stack is a register number? The word
 cannot be compiled, since the stack effect can potentially be
 arbitrary!)
 Being able to compile 'call' whose parameters are literals from the
 same word definition doesn't really add nothing new.
 A real breakthrough would be compiling "combinators"; words that take
 parameters that are themselves quotations.
 As it turns out, combinators themselves are not compiled -- however,
 specific *instances* of combinators in other word definitions are.
 For example, we can rewrite our word 'mag2' as follows:
 : mag2 ( x y -- sqrt[x*x+y*y] )
    [ sq ] 2apply + sqrt ;
 Where 2apply is defined as follows:
 : 2apply ( x y [ code ] -- )
    2dup 2>r nip call 2r> call ;
 How can we compile this new, equivalent, form of 'mag2'?
 ==== Inline words
 Normally, when the dataflow algorithm encounters a word as an element
 of a quotation, a call to that word's core() method is emitted. However,
 if the word is compiled 'immediately', its definition is substituted in.
 Assume for a second that in the new form of 'mag2', the word '2apply' is
 compiled inline (ignoring the specifics of how this decision is made).
 In other words, it is as if 'mag2' was defined as follows:
 : mag2 ( x y -- sqrt[x*x+y*y] )
    [ sq ] 2dup 2>r nip call 2r> call + sqrt ;
 However, we already have a way of compiling the above code; in fact it
 is compiled into the equivalent of:
 : mag2 ( x y -- sqrt[x*x+y*y] )
    [ sq ] 2dup 2>r nip drop sq 2r> drop sq + sqrt ;
                        ^^^^^^^     ^^^^^^^
 			immediate instantiation of 'call'
 As an aside, recall that the stack words 2dup, 2>r, nip, drop, and 2r>
 do not emit any code, and the 'drop' of the literal [ sq ] ensures that
 it never makes it to the compiled definition. The end-result is that the
 register-transfer code is identical to the earlier definition of 'mag2'
 which did not involve 2apply:
 r0 * r0 -> r0
 r1 * r1 -> r1
 r0 + r1 -> r0
 sqrt r0 -> r0
 return r0
 So, how is the decision made to compile a word inline, or not? It is
 quite simple. If the word has a deducable stack effect on the simulated
 stack of the current compilation, but it does *not* have a deducable
 stack effect on an empty simulated stack, it is compiled immediate.
 For example, the following word has a deducable stack effect, regardless
 of the values of any literals on the simulated stack:
 : sq ( x -- x^2 )
    dup * ;
 So the word 'sq' is always compiled normally.
 However, the '2apply' word we saw earlier does not have a deducable
 stack effect unless there is a literal quotation at the top of the
 simulated stack:
 : 2apply ( x y [ code ] -- )
    2dup 2>r nip call 2r> call ;
 So it is compiled inline.
 Sometimes it is desirable to have short non-combinator words inlined.
 While this is not necessary (whereas non-inlined combinators do not
 compile), it can increase performance, especially if the word returns
 multiple values (and without inlining, the interpreter datastack will
 need to be used).
 To mark a word for inline compilation, use the word 'inline' like so:
 : sq ( x -- x^2 )
    dup * ; inline
 The word 'inline' sets the inline slot of the most recently defined word
 object.
 (Indeed, to push a reference to the most recently defined word object,
 use the word 'word').
 === Branching
 The only branching primitive supported by factor is 'ifte'. The syntax
 is as follows:
 2 2 + 4 = ( condition that leaves boolean on the stack )
 [
    ( code to execute if condition is true )
 ] [
    ( code to execute if condition is false )
 ] ifte
 Note that the different components might be spread between words, and
 affected by stack operations in transit. Due to the dataflow algorithm
 and inlining, all useful cases can be handled correctly.
 ==== Not all branching forms have a deducable stack effect
 The first observation we gain is that if the two branches leave the
 stack in inconsistent states, then stack positions used by subsequent
 code will depend on the outcome of the branch.
 This practice is discouraged anyway -- it leads to hard-to-understand
 code -- so it is not supported by the compiler. If you must do it, the
 words will always run in the interpreter.
 Attempting to compile or balance an expression with such a branch raises
 an error:
    9] : bad-ifte 3 = [ 1 2 3 ] [ 2 2 + ] ifte ;
    10] word effect .
 break called.
 :r prints the callstack.
 :j prints the Java stack.
 :x returns to top level.
 :s returns to top level, retaining the data stack.
 :g continues execution (but expect another error).
 ERROR: Stack effect of [ 1 2 3 ] ( java.lang.Object -- java.lang.Object
 java.lang.Object java.lang.Object ) is inconsistent with [ 2 2 + ] (
 java.lang.Object -- java.lang.Object )
 Head is ( java.lang.Object -- )
 Recursive state:
 [ #<ifte,base=null,effect=( java.lang.Object -- boolean java.lang.Object
 java.lang.Object ); null.null()> #<bad-ifte,base=null,effect=( -- );
 null.null()> ]
 ==== Merging
 Lets return to our register transfer language, and add a branching
 notation:
 - two-instruction sequence to branch to <label> if <register> is null
  ALOAD <register>
  IFNULL <label>
 - unconditional goto to <label>
  GOTO <label>
 So a simple conditional
 rot [
    (true)
 ] [
    (false)
 ] ifte
 Will be compiled as follows, where the inputs are in registers 1, 2, 3
 1	ALOAD 1
 2	IFNULL 5
 3	(true)
 4	GOTO 6
 5	(false)
 6	RETURN
 However the question arises, what becomes of the simulated stack after
 the branches are done.
 For example, consider this snippet:
 random-int random-int random-boolean [
    swap
 ] [
 ] ifte
 The first three words followed by the branch itself are compiled like
 so:
 1	1 <- random-int
 2	2 <- random-int
 3	3 <- random-boolean
 4	ALOAD 3
 5	IFNULL 8
 However, a problem arises because if the true branch is taken, the
 simulated stack contains register 1 at the top, and register 2 below;
 but if the false branch is taken, it is the opposite!
 The solution is to "merge" the stacks at the end of each branch. So
 the remainder of our code might be compiled as follows:
 6	1 <-> 2 // new notation: exchange registers 1 and 2
 7	GOTO 8
 8	RETURN
 === Recursion
 Consider our old friend 'fib':
 : fib ( n -- nth fibonacci number )
    dup 1 <= [
        drop 1 
    ] [
        pred dup fib swap pred fib + 
    ] ifte ;
 Using the tools we have, we cannot deduce its stack effect yet, since
 the false branch of the 'ifte' refers to the word 'fib' itself.
 A critical observation is if the word is to complete, eventually, the
 test will fail and 'drop 1' will be executed.
 Note that this implies that when given a parameter of 0 or 1, the
 stack effect of 'fib' is ( X -- X ).
 ==== What is the stack effect?
 To see how to deduce the stack effect of the recursive case, it is
 necessary to make a mental leap. Consider the case where the parameter
 to fib is 2. The word recurses twice, and in each case, the parameter
 to the recursive call is <= 1, so 'drop 1' is executed.
 So when the parameter is 2, the stack effect is also ( X -- X )!
 In fact it is not hard to usee that if the stack effect of 'fib' with
 parameter n-1 and n-2 is ( X -- X ), then the stack effect of 'fib' with
 parameter n is also ( X -- X ).
 Therefore by induction, for any input, 'fib' has stack effect
 ( X -- X ).
 Once the stack effect is known, it is easy enough to compile; just treat
 the two recursive calls like calls to any other word with stack effect
 ( X -- X ).
 ==== Not all recursive forms have a deducable stack effect
 Consider the following word:
 : push ( list -- ... )
    dup [
        uncons push
    ] unless ;
 If the top of the stack is null, the word returns. So the base case is (
 X -- X ).
 However if the top of the stack is a list of one element, the word has
 stack effect ( X -- X X ), since 'uncons' has stack effect ( X -- X X )
 and the base case is ( X -- X ).
 If we proceed, we find that if the top of the stack is a list of two
 elements, the stack effect of the word is ( X -- X X X ).
 The stack positions used for intermediate values can no longer be
 determined ahead of time.
 A word whose stack effect depends on input is said to 'diverge'. Since
 it is generally good practice to only write converging recursive words,
 it is not a big loss that the compiler does not support them. Of course,
 such words still work in the interpreter.
 ==== Auxiliary methods
 So far, we can compile recursive words such as 'fib' and tail-recursive
 words such as 'list?'. Now, lets try applying our techniques to a word
 that calls a recursive combinator:
 : reverse ( list -- list )
    [ ] swap [ swons ] each ;
 Recall that 'swons' creates a cons cell with stack effect
 ( cdr car -- [ car , cdr ] ) -- the opposite order of 'cons', which has stack effect ( car cdr -- [ car , cdr ] ).
 The combinator 'each' is defined as follows:
 : each ( [ list ] [ quotation ] -- )
    over [
        >r uncons r> tuck 2>r call 2r> each
    ] [
        2drop
    ] ifte ;
 If we apply our previous inling technique, however, the end result is
 absurd, since the recursive call to 'each' remains:
 : reverse ( list -- list )
    f swap [ swons ] over [
        >r uncons r> tuck 2>r call 2r> each
    ] [
        2drop
    ] ifte ;
 However, if the recursive call is changed to 'reverse', then the result
 is also incorrect, since '[ ] swap' would be executed on each iteration.
 The solution is to place instances of recursive combinators in an
 'auxiliary method' in the same class as the definition being compiled.
 So in fact, 'reverse' is compiled as three methods, eval(), core(), and 
 aux_each_0().
 ==== Wrapping up
 There are two implementation details not covered here; they are not
 really 'interesting' and best described by the source code anyway:
 - tail-recursive words are compiled with a GOTO not a method invocation
  at the end of the recursive case.
 - some extra steps are needed to normalize the stack after recursive
  calls, and when auxiliary methods are being generated.
 === Conclusion
 Finally, lets see what kind of improvement we get over naive
 interpretation when our old friend the 'fib' word is compiled using all
 the techniques mentioned above:
    3] "fib" compile
    4] [ 25 fib ] time
 123
 That's right -- a 200x improvement over pure interpretation.
--- a/doc/devel-guide.lyx
+++ b/doc/devel-guide.lyx
--- a/doc/math.txt
+++ b/doc/math.txt
@ -0,0 +1,307 @@
 FACTOR MATH WORDS
 === Basics
 The following expressions demonstrate basic arithmetic in Factor.
    0] 2 2 + .
 4
    1] 10 4.5 - .
 5.5
    2] 12.5 3 * .
 37.5
    3] 6 20 / .
 3/10
    4] 6 20 /f .
 0.3
    5] 0.354 neg .
 -0.354
    6] 5 recip .
 0.2
 Arithmetic operators appear after their oprands, and intermediate values
 are stored on a stack -- this is called postfix syntax.
 The word . prints the value at the top of the stack.
 There are no operator precedence levels, and expressions can always be
 reprepsented unambiguously without parantheses, unlike traditional
 algebraic syntax.
 For example, (3 + 2) * (1 - 6) is written as:
 3 2 + 1 6 - *
 However, 3 + (2 * 1) - 6 is written as:
 3 2 1 * + 6 -
 === The number tower
 Factor supports operations with many types of numbers, transparently
 converting results from one type to another. The following informal
 diagram can be helpful in understanding the various types of numbers.
 +--------+ +--------+
 |=fixnum=| |=bignum=|
 +--------+ +--------+
 +-------------------+ +-------+
 |      integer      | |=ratio=|
 +-------------------+ +-------+
 +-----------------------------+ +-------+
 |           rational          | |=float=|
 +-----------------------------+ +-------+
 +---------------------------------------+ +---------+
 |                  real                 | |=complex=|
 +---------------------------------------+ +---------+
 +---------------------------------------------------+
 |                      number                       |
 +---------------------------------------------------+
 Types on the same row are disjoint.
 Each type is a subtype of all types directly below.
 Types whose boxes are marked with '=' are disjoint concrete types.
 Type upgrades are performed through the concrete types, from the top
 left down to the bottom right.
 Ratios and complex numbers are compound types; ratios consist of a pair
 of integers, complex numbers consist of a pair of real numbers.
 === Types of numbers
 All number entry is in base 10.
 The predicate word number? tests if the top of the stack is a number.
 Numbers are partitioned into two disjoint subsets; real numbers and
 complex numbers. (In math, the reals are a subset of the complex
 numbers. In Factor, a number whose imaginary part is zero is *not* a
 complex number).
 Real numbers are partitioned into three disjoint subsets: integers,
 ratios and floats.
 ==== Integers: 12 -100 340282366920938463463374607431768211456
 The predicate word integer? tests if the top of the stack is an integer.
 The integers are partitioned into two disjoint types:
 - signed 32-bit fixnums (predicate: fixnum?)
 - signed arbitrary precision bignums (predicate: bignum?)
 Fixnums are automatically upgraded as necessary to bignums.
 For example:
    8] 1073741824 fixnum? .
 t
    9] 128 fixnum? .
 t
    10] 1073741824 128 * .    
 137438953472
    11] 1073741824 128 * bignum? .
 t
 In the above example, the result of multiply those two fixnums exceeds
 2^31-1, and the result is upgraded to a bignum.
 When given integer operands, + - and * always return integers.
 ==== Ratios: 1/10 -37/78 10/3
 A ratio is the result of a division of two integers where the
 denimonator is not a multiple of the numerator.
 The predicate word ratio? tests if the top of the stack is a ratio.
 Ratios are always reduced to lowest terms, and the denominator is always
 positive. The numerator never equals zero since dividing zero by a
 non-zero integer always results in the integer zero.
 The accessor words numerator and denominator deconstruct a ratio. Given
 an integer, numerator is a no-op and denominator always returns 1.
    14] 100 -30 / numerator .
 -10
    15] 100 -30 / denominator .
 3
    16] 12 numerator .
 12
 The numerator and denominator are integers, and hence either fixnums or
 bignums (there is no requirement for them to be of the same type).
 The result of dividing two integers as a floating point number can be
 obtained using the word /f. For example:
    17] 1 3 /f .
 0.3333333333333333
 When arithmetic operators are given a ratio and an integer as
 parameters, the result is also a ratio or an integer.
 ==== Floats: -1.3 1.5e-6 0.003
 Floats are entered as double-precision. Single-precision floats can be
 constructed via coercion. They are converted to double-precision by
 arithmetic operands.
 The predicate word float? tests if the top of the stack is a single or
 double precision float.
 When at least one of the parameters to an arithmetic operator is a
 float, the result is always a (double precision) float.
 ==== Complex numbers: #{ 2 2.5 } #{ 1/2 1/3 }
 A complex number has a real and imaginary part. The syntax is to write
 #{ followed by the real part, followed by the imaginary part, and
 finally terminated with }. Each token must be separted with whitespace.
 The predicate word complex? tests if the top of the stack is a single or
 double precision float.
 For example, what is commonly written as 2-3.5i in textbooks is
 expressed as #{ 2 2.5 } in Factor.
 The real and imaginary parts can be either integers, ratios or floats.
 There is no requirement for them to be of the same type.
 The accessor words real and imaginary deconstruct a complex number. The
 real part followed by the imaginary part can both be pushed at once
 using the word >rect, and a new complex number can be constructed from a
 real and imaginary part using the word rect>.
    4] -i sqrt >rect .s
 -0.7071067811865475
 0.7071067811865476
    5] 1 2 rect> .
 #{ 1 2 }
 A complex number with an imaginary component of zero is automatically
 downgraded to an integer, a ratio or a float (depending on the type of
 its real component.)
    6] 10 0 rect> .
 10
    7] #{ 5 -10 } #{ 2 10 } + .
 7
 Complex numbers never arise as results of arithmetic operators with real
 operands. However, various irrational functions return complex values
 for some real inputs.
 === Mathematical functions
 ==== Square root, squaring, arbitrary powers
 These are pretty much self-explanatory.
    10] 36 sq sqrt .
 36.0
    11] -2 sqrt sq .
 -2.0000000000000004
    12] 10 15 ^ .
 1.0E15
    13] e pi i * ^ .
 #{ -1.0 1.2246467991473532E-16 }
 ==== Exponential, logarithm
 The function e^x and its inverse.
    15] e .
 2.718281828459045
    16] 2 exp .
 7.38905609893065
    17] 5 log 2 log - exp .
 2.5
 Note that the complex logarithm is infinitely-valued. The principle
 value is chosen such that the complex part is in the interval (-pi,pi].
    18] -10 log .
 #{ 2.302585092994046 3.141592653589793 }
 ==== Trigonometric and hyperbolic functions
 The full complement of trigonometric and hyperbolic functions and their
 inverses is provided:
 sin    cos    tan
 asin   acos   atan
 cosec  sec    cot
 acosec asec   acot
 sinh    cosh    tanh
 asinh   acosh   atanh
 cosech  sech    coth
 acosech asech   acoth
 Complex arguments are supported. The specific branch cuts used by the
 inverse functions are undocumented by can be deduced from the
 definitions of those functions, and the branch cuts taken by 'log' and
 'sqrt'.
 ==== Polar co-ordinates
 Complex numbers can be converted to/from polar co-ordinate
 representations using the words >polar and polar>.
    41] #{ 1 1 } >polar .s
 0.7853981633974483
 1.4142135623730951
    42] -5 pi 3 / polar> .
 #{ -2.5000000000000004 -4.330127018922193 }
 ==== Miscellaneous integer functions
 Factorial, fibonacci sequence, harmonic numbers.
    33] 128 2^ .
 340282366920938463463374607431768211456
    34] 30 fib .
 1346269
    35] 100 harmonic >float 100 log - .
 0.5822073316515288
    36] 1000 fac .
 402387260077093773543702433923003985719374864210714632543799910429938512
 398629020592044208486969404800479988610197196058631666872994808558901323
 829669944590997424504087073759918823627727188732519779505950995276120874
 975462497043601418278094646496291056393887437886487337119181045825783647
 849977012476632889835955735432513185323958463075557409114262417474349347
 553428646576611667797396668820291207379143853719588249808126867838374559
 731746136085379534524221586593201928090878297308431392844403281231558611
 036976801357304216168747609675871348312025478589320767169132448426236131
 412508780208000261683151027341827977704784635868170164365024153691398281
 264810213092761244896359928705114964975419909342221566832572080821333186
 116811553615836546984046708975602900950537616475847728421889679646244945
 160765353408198901385442487984959953319101723355556602139450399736280750
 137837615307127761926849034352625200015888535147331611702103968175921510
 907788019393178114194545257223865541461062892187960223838971476088506276
 862967146674697562911234082439208160153780889893964518263243671616762179
 168909779911903754031274622289988005195444414282012187361745992642956581
 746628302955570299024324153181617210465832036786906117260158783520751516
 284225540265170483304226143974286933061690897968482590125458327168226458
 066526769958652682272807075781391858178889652208164348344825993266043367
 660176999612831860788386150279465955131156552036093988180612138558600301
 435694527224206344631797460594682573103790084024432438465657245014402821
 885252470935190620929023136493273497565513958720559654228749774011413346
 962715422845862377387538230483865688976461927383814900140767310446640259
 899490222221765904339901886018566526485061799702356193897017860040811889
 729918311021171229845901641921068884387121855646124960798722908519296819
 372388642614839657382291123125024186649353143970137428531926649875337218
 940694281434118520158014123344828015051399694290153483077644569099073152
 433278288269864602789864321139083506217095002597389863554277196742822248
 757586765752344220207573630569498825087968928162753848863396909959826280
 956121450994871701244516461260379029309120889086942028510640182154399457
 156805941872748998094254742173582401063677404595741785160829230135358081
 840096996372524230560855903700624271243416909004153690105933983835777939
 410970027753472000000000000000000000000000000000000000000000000000000000
 000000000000000000000000000000000000000000000000000000000000000000000000
 000000000000000000000000000000000000000000000000000000000000000000000000
 000000000000000000000000000000000000000000000000
--- a/doc/naming.txt
+++ b/doc/naming.txt
@ -0,0 +1,66 @@
 FACTOR CODING CONVENTIONS.
 === Naming words
 foo.     - perform "foo", but instead of pushing the result on the
           stack, print it in a human-readable form suitable for
 	   interactive use.
 	   Eg: words. vocabs.
 .X       - three words to print the contents of the stacks:
           .s - data stack
 	   .r - call stack
 	   .n - name stack
 foo*     - a variation of "foo" that takes more parameters.
           Eg: index-of* parse* random-element*
         - a lower-level word used in the implementation of "foo".
 	   Eg: compile* prettyprint*
         - a word that is a variation on "foo", but is more specialized
 	   and less frequently used.
 	   Eg: last* get*
 >into    - convert object to type "into".
           Eg: >str >lower >upper >fixnum >realnum
 one>two  - convert object of type "one" to "two".
           Eg: stream>str stack>list worddef>list
         - transfer values between stacks.
 	   Eg: >r r> 2>r 2r> >n
 <type>   - create an object of "type".
           Eg: <namespace> <sbuf> <stream>
 foo@     - get the value of a variable at the top of the stack;
           operate on the value with "foo"; store the value back in the
 	   variable.
 	   Eg: +@ *@ -@ /@ cons@ append@
 foo-iter - a tail-recursive word used in the implementatin of "foo".
           Eg: nreverse-iter partition-iter
 nfoo     - on lists, a destructive version of "foo".
           Eg: nappend nreverse
 2foo     - like foo but with two operands taken from stack.
           Eg: 2drop 2dup 2each
 [foo]    - apply "foo" to each element of a list; sometimes with a bit
           more logic than just [ foo ] each.
 	   Eg: [+] [.]
--- a/factor/FactorInterpreter.java
+++ b/factor/FactorInterpreter.java
@ -35,7 +35,7 @@ import java.io.*;
 public class FactorInterpreter implements FactorObject, Runnable
 {
-	public static final String VERSION = "0.60.8";
+	public static final String VERSION = "0.60.9";
 	// command line arguments are stored here.
 	public Cons args;