Factor is \emph{interactive}. This means it is possible to run a Factor interpreter that reads from the keyboard, and immediately executes expressions as they are entered. This allows words to be defined and tested one at a time.
Factor is \emph{dynamic}. This means that all objects in the language are fully reflective at run time, and that new definitions can be entered without restarting the interpreter. Factor code can be used interchangably as data, meaning that sophisticated language extensions can be realized as libraries of words.
Factor is \emph{safe}. This means all code executes in a virtual machine that provides
garbage collection and prohibits direct pointer arithmetic. There is no way to get a dangling reference by deallocating a live object, and it is not possible to corrupt memory by overwriting the bounds of an array.
When examples of interpreter interactions are given in this guide, the input is in a roman font, and any
The implementation of \texttt{km/hour} is a bit more complex. We would like it to convert kilometers per hour to meters per second, to be consistent with the other units we defined. To get the desired result, we first have to convert to kilometers per second, then divide this by the number of seconds in one hour.
\texttt{dupd ( x y -{}- x x y )} Duplicate the second stack element.
\texttt{swapd ( x y z -{}- y x z )} Swap the second and third stack elements.
\texttt{transp ( x y z -{}- z y x )} Swap the first and third stack elements.
\texttt{2drop ( x y -{}- )} Discard the top two stack elements.
\texttt{2dup ( x y -{}- x y x y )} Duplicate the top two stack elements. A frequent use for this word is when two values have to be compared using something like \texttt{=} or \texttt{<} before being passed to another word.
\texttt{2swap ( x y z t -{}- z t x y )} Swap the top two stack elements.
You should try all these words out and become familiar with them. Push some numbers on the stack,
execute a shuffle word, and look at how the stack contents was changed using
more words. Each word should take at most a couple of sentences to describe. Effective factoring is like riding a bicycle -- once you ``get it'', it becomes second nature.
When an expression is parsed, each token in turn is looked up in the dictionary. If there is no dictionary entry, the token is parsed as a number instead.
The dictionary of words is structured as a set of named \emph{vocabularies}. Each vocabulary is a list
Words that return a boolean truth value are known as \emph{predicates}. Predicates are usually used to decide what to execute next at branch points. In Factor, there is no special boolean data type
-- instead, a special object \texttt{f} is the only object with a
``false'' boolean value. Every other object is a boolean ``true''.
The special object \texttt{t} is the ``canonical'' truth value. Note that words that return booleans don't return \texttt{t} as a rule; any object that is not equal to \texttt{f} can be returned as the true value.
The usual boolean operations are found in the \texttt{logic} vocabulary. Note that these are not integer bitwise operations; bitwise operations are described in the next chapter.
\texttt{?~( cond~true false -{}- obj~)} returns the second argument if the first argument is true, and returns the third argument if the first argument is false.
Compare the order of parameters here with the order of parameters in
the stack effect of \texttt{ifte}.
The stack effects of the two \texttt{ifte} branches should be
the same. If they differ, the word becomes harder to document and
debug.
\texttt{when}\texttt{( cond true -{}- )} and \texttt{unless}\texttt{( cond false -{}- )} are variations of \texttt{ifte} with only one active branch. The branches should produce as many values as they consume; this ensures that the stack effect of the entire \texttt{when} or \texttt{unless} expression is consistent regardless of which branch was taken.
Factor provides a rich set of math words. Factor numbers more closely model the mathematical concept of a number than other languages. Where possible, exact answers are given -- for example, adding or multiplying two integers never results in overflow, and dividing two integers yields a fraction rather than a truncated result. Complex numbers are supported, allowing many functions to be computed with parameters that would raise errors or return ``not a number'' in other languages.
The simplest type of number is the integer. Integers come in two varieties -- \emph{fixnums} and \emph{bignums}. As their names suggest, a fixnum is a fixed-width quantity\footnote{Fixnums range in size from $-2^{w-3}-1$ to $2^{w-3}$, where $w$ is the word size of your processor (for example, 32 bits). Because fixnums automatically grow to bignums, usually you do not have to worry about details like this.}, and is a bit quicker to manipulate than an arbitrary-precision bignum.
The predicate word \texttt{integer?} tests if the top of the stack is an integer. If this returns true, then exactly one of \texttt{fixnum?} or \texttt{bignum?} would return true for that object. Usually, your code does not have to worry if it is dealing with fixnums or bignums.
Unlike some languages where the programmer has to declare storage size explicitly and worry about overflow, integer operations automatically return bignums if the result would be too big to fit in a fixnum. Here is an example where multiplying two fixnums returns a bignum:
\begin{alltt}
134217728 fixnum? .
\emph{t}
128 fixnum? .
\emph{t}
134217728 128 * .
\emph{17179869184}
134217728 128 * bignum? .
\emph{t}
\end{alltt}
Integers can be entered using a different base. By default, all number entry is in base 10, however this can be changed by prefixing integer literals with one of the parsing words \texttt{BIN:}, \texttt{OCT:}, or \texttt{HEX:}. For example:
The word \texttt{.} prints numbers in decimal, regardless of how they were input. A set of words in the \texttt{prettyprint} vocabulary is provided for print integers using another base.
If we add, subtract or multiply any two integers, the result is always an integer. However, this is not the case with division. When dividing a numerator by a denominator where the numerator is not a integer multiple of the denominator, a ratio is returned instead.
Ratios are printed and can be input literally in the form of the second example. Ratios are always reduced to lowest terms by factoring out the greatest common divisor of the numerator and denominator. A ratio with a denominator of 1 becomes an integer. Trying to create a ratio with a denominator of 0 raises an error.
The predicate word \texttt{ratio?} tests if the top of the stack is a ratio. The predicate word \texttt{rational?} returns true if and only if one of \texttt{integer?} or \texttt{ratio?} would return true for that object. So in Factor terms, a ``ratio'' is a rational number whose denominator is not equal to 1.
Ratios behave just like any other number -- all numerical operations work as expected, and in fact they use the formulas for adding, subtracting and multiplying fractions that you learned in high school.
\begin{alltt}
1/2 1/3 + .
\emph{5/6}
100 6 / 3 * .
\emph{50}
\end{alltt}
Ratios can be deconstructed into their numerator and denominator components using the \texttt{numerator} and \texttt{denominator} words. The numerator and denominator are both integers, and furthermore the denominator is always positive. When applied to integers, the numerator is the integer itself, and the denominator is 1.
Rational numbers represent \emph{exact} quantities. On the other hand, a floating point number is an \emph{approximation}. While rationals can grow to any required precision, floating point numbers are fixed-width, and manipulating them is usually faster than manipulating ratios or bignums (but slower than manipulating fixnums). Floating point literals are often used to represent irrational numbers, which have no exact representation as a ratio of two integers. Floating point literals are input with a decimal point.
\begin{alltt}
1.23 1.5 + .
\emph{1.73}
\end{alltt}
The predicate word \texttt{float?} tests if the top of the stack is a floating point number. The predicate word \texttt{real?} returns true if and only if one of \texttt{rational?} or \texttt{float?} would return true for that object.
Floating point numbers are \emph{contagious} -- introducing a floating point number in a computation ensures the result is also floating point.
Apart from contaigion, there are two ways of obtaining a floating point result from a computation; the word \texttt{>float ( n -{}- f )} converts a rational number into its floating point approximation, and the word \texttt{/f ( x y -{}- x/y )} returns the floating point approximation of a quotient of two numbers.
Indeed, the word \texttt{/f} could be defined as follows:
\begin{alltt}
: /f / >float ;
\end{alltt}
However, the actual definition is slightly more efficient, since it computes the floating point result directly.
\subsection{Complex numbers}
Just like we had to widen the integers to the rationals in order to divide, we have to widen the real numbers to the set of \emph{complex numbers} to solve certain kinds of equations. For example, the equation $x^2+1=0$ has no solution for real $x$, because there is no real number that is a square root of -1. This is so because the real numbers are not \emph{algebraically complete}.
Complex numbers, however, are algebraically complete, and Factor will find one solution to this equation\footnote{The other, of course being \texttt{\#\{ 0 -1 \}}.}:
\begin{alltt}
-1 sqrt .
\emph{\#\{ 0 1 \}}
\end{alltt}
The literal syntax for a complex number is \texttt{\#\{ re im \}}, where \texttt{re} is the real part and \texttt{im} is the imaginary part. For example, the literal \texttt{\#\{ 1/2 1/3 \}} corresponds to the complex number $1/2+1/3i$.
The words \texttt{i} an \texttt{-i} push the literals \texttt{\#\{ 0 1 \}} and \texttt{\#\{ 0 -1 \}}, respectively.
The predicate word \texttt{complex?} tests if the top of the stack is a complex number. Note that unlike math, where all real numbers are also complex numbers, Factor only considers a number to be a complex number if its imaginary part is non-zero.
Complex numbers can be deconstructed into their real and imaginary components using the \texttt{real} and \texttt{imaginary} words. Both components can be pushed at once using the word \texttt{>rect ( z -{}- re im )}.
A complex number can be constructed from a real and imaginary component on the stack using the word \texttt{rect> ( re im -{}- z )}.
\begin{alltt}
1/3 5 rect> .
\emph{\#\{ 1/3 5 \}}
\end{alltt}
Complex numbers are stored in \emph{rectangular form} as a real/imaginary component pair (this is where the names \texttt{>rect} and \texttt{rect>} come from). An alternative complex number representation is \emph{polar form}, consisting of an absolute value and argument. The absolute value and argument can be computed using the words \texttt{abs} and \texttt{arg}, and both can be pushed at once using \texttt{>polar ( z -{}- abs arg )}.
\begin{alltt}
5.3 abs .
\emph{5.3}
i arg .
\emph{1.570796326794897}
\#\{ 4 5 \} >polar .s
\emph{\{ 6.403124237432849 0.8960553845713439 \}}
\end{alltt}
A new complex number can be created from an absolute value and argument using \texttt{polar> ( abs arg -{}- z )}.
\begin{alltt}
1 pi polar> .
\emph{\#\{ -1.0 1.224606353822377e-16 \}}
\end{alltt}
\subsection{Transcedential functions}
The \texttt{math} vocabulary provides a rich library of mathematical functions that covers exponentiation, logarithms, trigonometry, and hyperbolic functions. All functions accept and return complex number arguments where appropriate. These functions all return floating point values, or complex numbers whose real and imaginary components are floating point values.
\texttt{\^{} ( x y -- x\^{}y )} raises \texttt{x} to the power of \texttt{y}. In the cases of \texttt{y} being equal to $1/2$, -1, or 2, respectively, the words \texttt{sqrt}, \texttt{recip} and \texttt{sq} can be used instead.
All remaining functions have a stack effect \texttt{( x -{}- y )}, it won't be repeated for brevity.
\texttt{exp} raises the number $e$ to a specified power. The number $e$ can be pushed on the stack with the \texttt{e} word, so \texttt{exp} could have been defined as follows:
However, it is actually defined otherwise, for efficiency.\footnote{In fact, the word \texttt{\^{}} is actually defined in terms of \texttt{exp}, to correctly handle complex number arguments.}
\texttt{sin}, \texttt{cos} and \texttt{tan} are the familiar trigonometric functions, and \texttt{asin}, \texttt{acos} and \texttt{atan} are their inverses.
The reciprocals of the sine, cosine and tangent are defined as \texttt{sec}, \texttt{cosec} and \texttt{cot}, respectively. Their inverses are \texttt{asec}, \texttt{acosec} and \texttt{acot}.
Similarly, the reciprocals of the hyperbolic functions are defined as \texttt{sech}, \texttt{cosech} and \texttt{coth}, respectively. Their inverses are \texttt{asech}, \texttt{acosech} and \texttt{acoth}.
\subsection{Modular arithmetic}
In addition to the standard division operator \texttt{/}, there are a few related functions that are useful when working with integers.
However, the actual definition is a bit more efficient than that.
\texttt{mod ( x y -{}- x\%y )} computes the remainder of dividing \texttt{x} by \texttt{y}. If the result is 0, then \texttt{x} is a multiple of \texttt{y}.
\texttt{/mod ( x y -{}- x/y x\%y )} pushes both the quotient and remainder.
\texttt{gcd ( x y -{}- z )} pushes the greatest common divisor of two integers; that is, the largest number that both integers could be divided by and still yield integers as results. This word is used behind the scenes to reduce rational numbers to lowest terms when doing ratio arithmetic.
There are two ways of looking at an integer -- as a mathematical entity, or as a string of bits. The latter representation faciliates the so-called \emph{bitwise operations}.
\texttt{bitand ( x y -{}- x\&y )} returns a new integer where each bit is set if and only if the corresponding bit is set in both $x$ and $y$. If you're considering an integer as a sequence of bit flags, taking the bitwise-and with a mask switches off all flags that are not explicitly set in the mask.
\texttt{bitor ( x y -{}- x|y )} returns a new integer where each bit is set if and only if the corresponding bit is set in at least one of $x$ or $y$. If you're considering an integer as a sequence of bit flags, taking the bitwise-or with a mask switches on all flags that are set in the mask.
\texttt{bitxor ( x y -{}- x\^{}y )} returns a new integer where each bit is set if and only if the corresponding bit is set in exactly one of $x$ or $y$. If you're considering an integer as a sequence of bit flags, taking the bitwise-xor with a mask toggles on all flags that are set in the mask.
\texttt{shift ( x n -{}- y )} returns a new integer consisting of the bits of the first integer, shifted to the left by $n$ positions. If $n$ is negative, the bits are shifted to the right instead, and bits that ``fall off'' are discarded.
The attentive reader will notice that shifting to the left is equivalent to multiplying by a power of two, and shifting to the right is equivalent to performing a truncating division by a power of two.
The word \texttt{=} is found in the \texttt{kernel} vocabulary, and the words \texttt{2dup} and \texttt{2drop} are found in the \texttt{stack} vocabulary. Since \texttt{=}
consumes both its inputs, we must first duplicate the \texttt{actual} and \texttt{guess} parameters using \texttt{2dup}. The word \texttt{correct} does not need to do anything with these two numbers, so they are popped off the stack using \texttt{2drop}. Try evaluating the following
Factor supports two primary types for storing sequential data; lists and vectors.
Lists are stored in a linked manner, with each node of the list holding an
element and a reference to the next node. Vectors, on the other hand, are contiguous sets of cells in memory, with each cell holding an element. Strings and string buffers can be considered as vectors specialized to holding characters, with the additional restriction that strings are immutable.
A \emph{proper list} (or often, just a \emph{list}) is a cons cell whose car is the first element, and the cdr is the \emph{rest of the list}. The car of the last cons cell in the list is the last element, and the cdr is \texttt{f}.
It is worth mentioning a few words closely related to and defined in terms of \texttt{cons}, \texttt{car} and \texttt{cdr}.
\texttt{swons ( cdr car -{}- cons )} constructs a cons cell, with the argument order reversed. Usually, it is considered bad practice to define two words that only differ by parameter order, however cons cells are constructed about equally frequently with both orders. Of course, \texttt{swons} is defined as follows:
\begin{alltt}
: swons swap cons ;
\end{alltt}
\texttt{uncons ( cons -{}- car cdr )} pushes both constituents of a cons cell. It is defined as thus:
while \texttt{add} has to copy the entire list first! If you need to add to the end of a sequence frequently, consider either using a vector, or adding to the beginning of a list and reversing the list when done.
\texttt{unique ( elem list -{}- list )} Return a new list containing the new element. If the list already contains the element, the same list is returned, otherwise the element is consed onto the list. This word executes in linear time, so its use in loops can be a potential performance bottleneck.
Recall the syntax for a conditional statement from the first chapter:
\begin{alltt}
1 2 < {[} "1 is less than 2." print {]}{[} "bug!" print {]} ifte
\end{alltt}
The syntax for the quotations there looks an aweful lot like the syntax for literal lists. In fact, code quotations \emph{are} lists. Factor code is data, and vice versa.
Essentially, the interpreter iterates through code quotations, pushing literals and executing words. When a word is executed, one of two things happen -- either the word has a colon definition, and the interpreter is invoked recursively on the definition, or the word is primitive, and it is executed by the underlying virtual machine.
\subsection{The call stack}
So far, we have seen what we called ``the stack'' store intermediate values between computations. In fact Factor maintains a number of other stacks, and the formal name for the stack we've been dealing with so far is the \emph{data stack}.
Another fundamental stack is the \emph{call stack}. When invoking an inner colon definition, the interpreter pushes the current execution state on the call stack so that it can be restored later.
The call stack also serves a dual purpose as a temporary storage area. Sometimes, juggling values on the data stack becomes ackward, and in that case \texttt{>r} and \texttt{r>} can be used to move a value from the data stack to the call stack, and vice versa, respectively.
The idea of \emph{recursion} is key to understanding Factor. A \emph{recursive} word definition is one that refers to itself, usually in one branch of a conditional. The general form of a recursive word looks as follows:
The recursive case contains one or more calls to the original word.
There are a few things worth noting about the stack flow inside a recursive word. The condition must take care to preserve any input parameters needed for the base case and recursive case. The base case must consume all inputs, and leave the final return values on the stack. The recursive case should somehow reduce one of the parameters. This could mean incrementing or decrementing an integer, taking the \texttt{cdr} of a list, and so on. Parameters must eventually reduce to a state where the condition returns \texttt{f}, to avoid an infinite recursion.
The recursive case should also be coded such that the stack effect of the total definition is the same regardless of how many iterations are preformed; words that consume or produce different numbers of paramters depending on circumstances are very hard to debug.
In a programming language such as Java\footnote{Although by the time you read this, Java implementations might be doing tail-call optimization.}, using recursion to iterate through a long list is highly undesirable because it risks overflowing the (finite) call stack depth. However, Factor performs \emph{tail call optimization}, which is based on the observation that if the recursive call is made at a point right before the word being defined would return, there is \emph{actually nothing to save} on the call stack, so recursion call nesting can occur to arbitrary depth. Such recursion is known as \emph{tail recursion}.
The reason it is not tail recursive is that after the recursive call to \texttt{factorial} returns, the \texttt{*} word must still be called.\footnote{
It is possible to rewrite \texttt{factorial} to be tail-recursive by splitting it into two words, the second of which takes an accumulator which is multiplied at each iteration. Of course, none of this is relevant, since the built-in library already provides a word \texttt{fac} that uses a tail-recursive combinator.}
The following definition is tail-recursive, due to the placement of the recursive call to \texttt{contains?}, as well as the \texttt{ifte} forms that surround it:
Recall from \ref{sub:Conditionals} that a quotation is simply a list containing executable words and literals. A \emph{combinator} is a word that takes quotations from the stack and executes them according to some pattern. We've already seen the \texttt{ifte} combinator, which is the basis of all conditional branching. Another very simple combinator is \texttt{call}.
\texttt{call}\texttt{( quot -{}- )} executes the quotation at the
A pair of combinators for iterating over vectors are provided in the \texttt{vectors} vocabulary. The first is the \texttt{vector-each} word that does nothing other than applying a quotation to each element. The second is the \texttt{vector-map} word that also collects the return values of the quotation into a new vector.
\texttt{vector-each ( vector quot -{}- )} pushes each element of the vector in turn, and executes the quotation. The quotation should have a stack effect of \texttt{( obj -- )}. The vector and the quotation are not on the stack when the quotation is executed. This allows the quotation to use values below the vector for accumilation and so on.
The \texttt{stack>list} word makes use of \texttt{vector-each} to construct a list containing all elements of a given vector, in reverse order. In fact, its definition looks exactly like that of \texttt{reverse} except the \texttt{vector-each} combinator is used in place of \texttt{each}:
The \texttt{vector>list} word is defined as first creating a list of all elements in the vector in reverse order using \texttt{stack>list}, and then reversing this list:
\texttt{vector-map ( vector quot -{}- str )} is similar to \texttt{vector-each}, except after each iteration the return value of the quotation is collected into a new vector. The quotation should have a stack effect of \texttt{( obj -- obj )}.
The \texttt{clone-vector} word is implemented as a degenerate case of \texttt{vector-map} -- the elements of the original vector are copied into a new vector without any modification:
A pair of combinators for iterating over strings are provided in the \texttt{strings} vocabulary. The first is the \texttt{str-each} word that does nothing other than applying a quotation to each character. The second is the \texttt{str-map} word that also collects the return values of the quotation into a new string.
\texttt{str-each ( str quot -{}- )} pushes each character of the string in turn, and executes the quotation. The quotation should have a stack effect of \texttt{( ch -{}- )}. The string and the quotation are not on the stack when the quotation is executed. This allows the quotation to use values below the string for accumilation and so on. The following example counts the number of occurrences of the letter ``a'' in a string:
\texttt{str-map ( str quot -{}- str )} is similar to \texttt{str-each}, except after each iteration the return value of the quotation is collected into a new string. The quotation should have a stack effect of \texttt{( ch -- str/ch )}. The following example replaces all occurrences of the space character in the string with \texttt{+}:
\begin{alltt}
"We do not like spaces" {[} CHAR: \textbackslash{}s CHAR: + replace {]} str-map .
For the second practical example, we will code a small program that tracks how long you spend working on tasks. It will provide two primary functions, one for adding a new task and measuring how long you spend working on it, and another to print out the timesheet. A typical interaction looks like this:
Enter a letter between ( ) to execute that action.}
a
\emph{Start work on the task now. Press ENTER when done.
Please enter a description:
Working on the Factor HTTP server
(E)xit
(A)dd entry
(P)rint timesheet
Enter a letter between ( ) to execute that action.}
a
\emph{Start work on the task now. Press ENTER when done.
Please enter a description:}
Writing a kick-ass web app
\emph{
(E)xit
(A)dd entry
(P)rint timesheet
Enter a letter between ( ) to execute that action.}
p
\emph{TIMESHEET:
Working on the Factor HTTP server 0:25
Writing a kick-ass web app 1:03
(E)xit
(A)dd entry
(P)rint timesheet
Enter a letter between ( ) to execute that action.}
x
\end{alltt}
Once you have finished working your way through this tutorial, you might want to try extending the program -- for example, it could print the total hours, prompt for an hourly rate, then print the amount of money that should be billed.
Enter a letter between ( ) to execute that action.
\end{alltt}
We will represent the menu as an association list. Recall that an association list is a list of pairs, where the car of each pair is a key, and the cdr is a value. Our keys will literally be keyboard keys (``e'', ``a'' and ``p''), and the values will themselves be pairs consisting of a menu item label and a quotation.
The first word we will code is \texttt{print-menu}. It takes an association list, and prints the second element of each pair's value. Note that \texttt{terpri} simply prints a blank line:
\begin{alltt}
: print-menu ( menu -{}- )
terpri {[} cdr car print {]} each terpri
"Enter a letter between ( ) to execute that action." print ;
\end{alltt}
You can test \texttt{print-menu} with a short association list:
Enter a letter between ( ) to execute that action.}
\end{alltt}
The next step is to write a \texttt{menu-prompt} word that takes the same association list, reads a line of input from the keyboard, and executes the quotation associated with that line. Recall that the \texttt{assoc} word returns \texttt{f} if the specified key could not be found in the association list. The below definition makes use of a conditional to signal an error in that case:
\begin{alltt}
: menu-prompt ( menu -{}- )
read swap assoc dup {[}
cdr call
{]}{[}
"Invalid input: " swap unparse cat2 throw
{]} ifte ;
\end{alltt}
Try applying the new \texttt{menu-prompt} word to the association list we used to test \texttt{print-menu}. You should verify that entering \texttt{x} causes the quotation \texttt{{[} 2 2 + . {]}} to be executed:
Finally, we want a \texttt{menu} word that first prints a menu, then prompts for and acts on input:
\begin{alltt}
: menu ( menu -{}- )
dup print-menu menu-prompt ;
\end{alltt}
Considering the stack effects of \texttt{print-menu} and \texttt{menu-prompt}, it should be obvious why the \texttt{dup} is needed.
\subsection{Finishing off}
We now need a \texttt{main-menu} word. It takes the timesheet vector from the stack, and recursively calls itself until the user requests that the timesheet application exits:
Note that unless the first option is selected, the timesheet vector is eventually passed into the recursive \texttt{main-menu} call.
All that remains now is the ``main word'' that runs the program with an empty timesheet vector. Note that the initial capacity of the vector is 10 elements, however this is not a limit -- adding more than 10 elements will grow the vector:
What does it mean for two objects to be ``equal''? In actual fact, there are two ways of comparing objects. Two object references can be compared for \emph{identity} using the \texttt{eq? ( obj obj -{}- ? )} word. This only returns true if both references point to the same object. A weaker form of comparison is the \texttt{= ( obj obj -{}- ? )} word, which checks if two objects ``have the same shape''.
If two objects are \texttt{eq?}, they will also be \texttt{=}.
For example, two literal objects with the same printed representation are as a general rule not always \texttt{eq?}, however they are \texttt{=}:
\begin{alltt}
{[} 1 2 3 {]}{[} 1 2 3 {]} eq? .
\emph{f}
{[} 1 2 3 {]}{[} 1 2 3 {]} = .
\emph{t}
\end{alltt}
On the other hand, duplicating an object reference on the stack using \texttt{dup} or similar, will give two references which are \texttt{eq?}:
\begin{alltt}
"Hello" dup eq? .
\emph{t}
\end{alltt}
An object can be cloned using \texttt{clone ( obj -{}- obj )}. The clone will no longer be \texttt{eq?} to the original (unless the original is immutable, in which case cloning is a no-op); however clones are always \texttt{=}.
An \emph{association list} is one where every element is a cons. The
car of each cons is a name, the cdr is a value. The literal notation
is suggestive:
\begin{alltt}
{[}
{[} "Jill" | "CEO" {]}
{[} "Jeff" | "manager" {]}
{[} "James" | "lowly web designer" {]}
{]}
\end{alltt}
\texttt{assoc? ( obj -{}- ? )} returns \texttt{t} if the object is
a list whose every element is a cons; otherwise it returns \texttt{f}.
\texttt{assoc ( key alist -{}- value )} looks for a pair with this
key in the list, and pushes the cdr of the pair. Pushes f if no pair
with this key is present. Note that \texttt{assoc} cannot differentiate between
a key that is not present at all, or a key with a value of \texttt{f}.
\texttt{assoc{*} ( key alist -{}- {[} key | value {]} )} looks for
a pair with this key, and pushes the pair itself. Unlike \texttt{assoc},
\texttt{assoc{*}} returns different values in the cases of a value
set to \texttt{f}, or an undefined value.
\texttt{set-assoc ( value key alist -{}- alist )} removes any existing
occurrence of a key from the list, and adds a new pair. This creates
a new list, the original is unaffected.
\texttt{acons ( value key alist -{}- alist )} is slightly faster
than \texttt{set-assoc} since it simply conses a new pair onto the
list. However, if used repeatedly, the list will grow to contain a
lot of {}``shadowed'' pairs.
The following pair of word definitions from the \texttt{html} vocabulary demonstrates the usage of association lists. It implements a mapping of special characters to their HTML entity names. Note the usage of \texttt{?} to return the original character if the association lookup yields \texttt{f}:
\begin{alltt}
: html-entities ( -- alist )
{[}
{[} CHAR: < | "\<" {]}
{[} CHAR: > | "\>" {]}
{[} CHAR: \& | "\&" {]}
{[} CHAR: ' | "\'" {]}
{[} CHAR: \" | "\"" {]}
{]} ;
: char>entity ( ch -- str )
dup >r html-entities assoc dup r> ? ;
\end{alltt}
Searching association lists incurs a linear time cost, so they should
only be used for small mappings -- a typical use is a mapping of half
a dozen entries or so, specified literally in source. Hashtables offer
A hashtable, much like an association list, stores key/value pairs, and offers lookup by key. However, whereas an association list must be searched linearly to locate keys, a hashtable uses a more sophisticated method. Key/value pairs are sorted into \emph{buckets} using a \emph{hash function}. If two objects are equal, then they must have the same hash code; but not necessarily vice versa. To look up the value associated with a key, only the bucket corresponding to the key has to be searched. A hashtable is simply a vector of buckets, where each bucket is an association list.
\texttt{<hashtable> ( capacity -{}- hash )} creates a new hashtable with the specified number of buckets. A hashtable with one bucket is basically an association list. Right now, a ``large enough'' capacity must be specified, and performance degrades if there are too many key/value pairs per bucket. In a future implementation, hashtables will grow as needed as the number of key/value pairs increases.
\texttt{hash ( key hash -{}- value )} looks up the value associated with a key in the hashtable. Pushes \texttt{f} if no pair with this key is present. Note that \texttt{hash} cannot differentiate between a key that is not present at all, or a key with a value of \texttt{f}.
\texttt{hash* ( key hash -{}- {[} key | value {]})} looks for
a pair with this key, and pushes the pair itself. Unlike \texttt{hash},
\texttt{hash{*}} returns different values in the cases of a value
set to \texttt{f}, or an undefined value.
\texttt{set-hash ( value key hash -{}- )} stores a key/value pair in a hashtable.
Notice that until now, all the code except a handful of examples has only used the stack for storage. You can also use variables to store temporary data, much like in other languages, however their use is not so prevalent. This is not a coincidence -- Fator was designed this way, and mastery of the stack is essential. Using variables where the stack is more appropriate leads to ugly, unreusable code.
Variables are typically used for longer-term storage of data, and for temporary storage of objects that are being constructed, where using the stack would be ackward. Another use for variables is compound data structures, realized as nested namespaces of variables. This concept should be instantly familiar to anybody who's used an object-oriented programming language.
The words \texttt{get ( name -{}- value )} and \texttt{set ( value name -{}- )} retreive and store variable values, respectively. Variable names are strings, and they do not have to be declared before use. For example:
Only having one list of variable name/value bindings would make the language terribly inflexible. Instead, a variable has any number of potential values, one per namespace. There is a notion of a ``current namespace''; the \texttt{set} word always stores variables in the current namespace. On the other hand, \texttt{get} traverses up the stack of namespace bindings until it finds a variable with the specified name.
\texttt{bind ( namespace quot -{}- )} executes a quotation in the dynamic scope of a namespace. For example, the following sets the value of \texttt{x} to 5 in the global namespace, regardless of the current namespace at the time the word was called.
\begin{alltt}
: global-example ( -- )
global {[} 5 "x" set {]} bind ;
\end{alltt}
\texttt{<namespace> ( -{}- namespace )} creates a new namespace object. Actually, a namespace is just a hashtable, with a default capacity.
\texttt{with-scope ( quot -{}- )} combines \texttt{<namespace>} with \texttt{bind} by executing a quotation in a new namespace.
The list construction words provide an alternative way to build up a list. Instead of passing a partial list around on the stack as it is built, they store the partial list in a variable. This reduces the number
The string construction words provide an alternative way to build up a string. Instead of passing a string buffer around on the stack, they store the string buffer in a variable. This reduces the number
Compare the following two examples -- both define a word that concatenates together all elements of a list of strings. The first one uses a string buffer stored on the stack, the second uses string construction words:
The scope created by \texttt{<\%} and \texttt{\%>} is \emph{dynamic}; that is, all code executed between two words is part of the scope. This allows the call to \texttt{\%} to occur in a nested word. For example, here is a pair of definitions that turn an association list of strings into a string of the form \texttt{key1=value1 key2=value2 ...}:
The \texttt{bind} combinator creates dynamic scope by pushing and popping namespaces on the so-called \emph{name stack}. Its definition is simpler than one would expect:
The words \texttt{>n} and \texttt{n>} push and pop the name stack, respectively. Observe the stack flow in the definition of \texttt{bind}; the namespace goes on the name stack, the quotation is called, and the name space is popped and discarded.