applications. Factor borrows heavily from Forth, Joy and Lisp. Programmers familiar with these languages will recognize many similarities with Factor.
Factor is \emph{interactive}. This means it is possible to run a Factor interpreter that reads from the keyboard, and immediately executes expressions as they are entered. This allows words to be defined and tested one at a time.
Factor is \emph{dynamic}. This means that all objects in the language are fully reflective at run time, and that new definitions can be entered without restarting the interpreter. Factor code can be used interchangably as data, meaning that sophisticated language extensions can be realized as libraries of words.
Factor is \emph{safe}. This means all code executes in a virtual machine that provides
garbage collection and prohibits direct pointer arithmetic. There is no way to get a dangling reference by deallocating a live object, and it is not possible to corrupt memory by overwriting the bounds of an array.
When examples of interpreter interactions are given in this guide, the input is in a roman font, and any
The implementation of \texttt{km/hour} is a bit more complex -- to convert from kilometers per hour to our ``canonical'' meters per second, we have to first convert to kilometers per second, then divide this by the number of seconds in one hour to get the desired result:
\texttt{dupd ( x y -{}- x x y )} Duplicate the second stack element.
\texttt{swapd ( x y z -{}- y x z )} Swap the second and third stack elements.
\texttt{transp ( x y z -{}- z y x )} Swap the first and third stack elements.
\texttt{2drop ( x y -{}- )} Discard the top two stack elements.
\texttt{2dup ( x y -{}- x y x y )} Duplicate the top two stack elements. A frequent use for this word is when two values have to be compared using something like \texttt{=} or \texttt{<} before being passed to another word.
\texttt{2swap ( x y z t -{}- z t x y )} Swap the top two stack elements.
You should try all these words out and become familiar with them. Push some numbers on the stack,
execute a shuffle word, and look at how the stack contents was changed using
more words. Each word should take at most a couple of sentences to describe. Effective factoring is like riding a bicycle -- once you ``get it'', it becomes second nature.
When an expression is parsed, each token in turn is looked up in the dictionary. If there is no dictionary entry, the token is parsed as a number instead.
The dictionary of words is structured as a set of named \emph{vocabularies}. Each vocabulary is a list
Words that return a boolean truth value are known as \emph{predicates}. Predicates are usually used to decide what to execute next at branch points. In Factor, there is no special boolean data type
-- instead, a special object \texttt{f} is the only object with a
``false'' boolean value. Every other object is a boolean ``true''.
The special object \texttt{t} is the ``canonical'' truth value. Note that words that return booleans don't return \texttt{t} as a rule; any object that is not equal to \texttt{f} can be returned as the true value.
The usual boolean operations are found in the \texttt{logic} vocabulary. Note that these are not integer bitwise operations; bitwise operations are described in the next chapter.
\texttt{>boolean ( ? -{}- ? )} returns \texttt{t} if the top of stack is anything except \texttt{f}, and \texttt{f} otherwise. So it does not change the boolean value of an object, but rather it converts it to canonical form. This word is rarely used.
\texttt{not ( ? -{}- ? )} returns \texttt{t} if the top of stack is \texttt{f}, and \texttt{f} otherwise.
\texttt{and ( ? ? -{}- ? )} returns a true value if both input parameters are true.
\texttt{or ( ? ? -{}- ? )} returns a true value if at least one of the input parameters is true.
\texttt{xor ( ? ? -{}- ? )} returns a true value if exactly one of the input parameters is true.
\begin{alltt}
t t and .
\emph{t}
5 f and .
\emph{f}
f "hi" or .
\emph{"hi"}
f f or .
\emph{f}
t t xor .
\emph{f}
t f xor .
\emph{t}
\end{alltt}
\subsection{Combinators}
A quotation a list of objects that can be executed. Words that execute quotations are called \emph{combinators}. Quotations are input
using the following syntax:
\begin{alltt}
{[} 2 3 + . {]}
\end{alltt}
When input, a quotation is not executed immediately -- rather, it
is pushed on the stack. Try evaluating the following:
\begin{alltt}
{[} 1 2 3 + {*}{]} .s
\emph{\{{[} 1 2 3 + {*}{]}\}}
call .s
\emph{\{ 5 \}}
\end{alltt}
\texttt{call}\texttt{( quot -{}- )} executes the quotation at the
top of the stack. Using \texttt{call} with a literal quotation is
useless; writing out the elements of the quotation has the same effect.
However, the \texttt{call} combinator is a building block of more
powerful combinators, since quotations can be passed around arbitrarily
and even modified before being called.
\texttt{ifte}\texttt{( cond true false -{}- )} executes either the
\texttt{true} or \texttt{false} quotations, depending on the boolean
value of \texttt{cond}. Here is an example of \texttt{ifte} usage:
\begin{alltt}
1 2 < {[} "1 is less than 2." print {]}{[} "bug!" print {]} ifte
\end{alltt}
Compare the order of parameters here with the order of parameters in
the stack effect of \texttt{ifte}.
The stack effects of the two \texttt{ifte} branches should be
the same. If they differ, the word becomes harder to document and
debug.
\texttt{when}\texttt{( cond true -{}- )} and \texttt{unless}\texttt{( cond false -{}- )} are variations of \texttt{ifte} with only one active branch. The branches should produce as many values as they consume; this ensures that the stack effect of the entire \texttt{when} or \texttt{unless} expression is consistent regardless of which branch was taken.
\texttt{times ( num quot -{}- )} executes a quotation a number of
times. It is good style to have the quotation always consume as many
values from the stack as it produces. This ensures the stack effect
of the entire \texttt{times} expression stays constant regardless
of the number of iterations.
More combinators will be introduced in later sections.
\subsection{Recursion}
The idea of \emph{recursion} is key to understanding Factor. A \emph{recursive} word definition is one that refers to itself, usually in one branch of a conditional.
Factor provides a rich set of math words. Factor numbers more closely model the mathematical concept of a number than other languages. Where possible, exact answers are given -- for example, adding or multiplying two integers never results in overflow, and dividing two integers yields a fraction rather than a truncated result. Complex numbers are supported, allowing many functions to be computed with parameters that would raise errors or return ``not a number'' in other languages.
\subsection{Integers}
The simplest type of number is the integer. Integers come in two varieties -- \emph{fixnums} and \emph{bignums}. As their names suggest, a fixnum is a fixed-width quantity\footnote{Fixnums range in size from $-2^{w-3}-1$ to $2^{w-3}$, where $w$ is the word size of your processor (for example, 32 bits). Usually, you do not have to worry about details like this.}, and is a bit quicker to manipulate than an arbitrary-precision bignum.
The predicate word \texttt{integer?} tests if the top of the stack is an integer. If this returns true, then exactly one of \texttt{fixnum?} or \texttt{bignum?} would return true for that object. Usually, your code does not have to worry if it is dealing with fixnums or bignums.
Unlike some languages where the programmer has to declare storage size explicitly and worry about overflow, integer operations automatically return bignums if the result would be too big to fit in a fixnum. Here is an example where multiplying two fixnums returns a bignum:
\begin{alltt}
134217728 fixnum? .
\emph{t}
128 fixnum? .
\emph{t}
134217728 128 * .
\emph{17179869184}
134217728 128 * bignum? .
\emph{t}
\end{alltt}
Integers can be entered using a different base. By default, all number entry is in base 10, however this can be changed by prefixing integer literals with one of the parsing words \texttt{BIN:}, \texttt{OCT:}, or \texttt{HEX:}. For example:
\begin{alltt}
BIN: 1110 BIN: 1 + .
\emph{15}
HEX: deadbeef 2 * .
\emph{7471857118}
\end{alltt}
The word \texttt{.} prints numbers in decimal, regardless of how they were input. A set of words in the \texttt{unparser} vocabulary is provided for turning integers into string representations in another base. These strings can then be printed using \texttt{print} from the \texttt{stdio} vocabulary.
\begin{alltt}
1234 >hex print
\emph{4d2}
1234 >bin print
\emph{10011010010}
\end{alltt}
\subsection{Rational numbers}
If we add, subtract or multiply any two integers, the result is always an integer. However, this is not the case with division. When dividing a numberator by a denominator where the numerator is not a integer multiple of the denominator, a ratio is returned instead.
\begin{alltt}
1210 11 / .
\emph{110}
100 330 / .
\emph{10/33}
\end{alltt}
Ratios are printed and can be input literally in the form of the second example. Ratios are always reduced to lowest terms by factoring out the \emph{greatest common divisor} of the numerator and denominator. A ratio with a denominator of 1 becomes an integer. Trying to create a ratio with a denominator of 0 raises an error.
The predicate word \texttt{ratio?} tests if the top of the stack is a ratio. The predicate word \texttt{rational?} returns true if and only if one of \texttt{integer?} or \texttt{ratio?} would return true for that object. So in Factor terms, a ``ratio'' is a rational number whose denominator is not equal to 1.
Ratios behave just like any other number -- all numerical operations work as expected, and in fact they use the formulas for adding, subtracting and multiplying fractions that you learned in high school.
\begin{alltt}
1/2 1/3 + .
\emph{5/6}
100 6 / 3 * .
\emph{50}
\end{alltt}
Ratios can be deconstructed into their numerator and denominator components using the \texttt{numerator} and \texttt{denominator} words. The numerator and denominator are both integers, and furthermore the denominator is always positive. When applied to integers, the numerator is the integer itself, and the denominator is 1.
\begin{alltt}
75/33 numerator .
\emph{25}
75/33 denominator .
\emph{11}
12 numerator .
\emph{12}
\end{alltt}
\subsection{Real numbers}
Rational numbers represent \emph{exact} quantities. On the other hand, a floating point number is an \emph{approximation}. While rationals can grow to any required precision, floating point numbers are fixed-width, and manipulating them is usually faster than manipulating ratios or bignums (but slower than manipulating fixnums). Floating point literals are often used to represent irrational numbers, which have no exact representation as a ratio of two integers. Floating point literals are input with a decimal point.
\begin{alltt}
1.23 1.5 + .
\emph{1.73}
\end{alltt}
The predicate word \texttt{float?} tests if the top of the stack is a floating point number. The predicate word \texttt{real?} returns true if and only if one of \texttt{rational?} or \texttt{float?} would return true for that object.
Floating point numbers are \emph{contagious} -- introducing a floating point number in a computation ensures the result is also floating point.
\begin{alltt}
5/4 1/2 + .
\emph{7/4}
5/4 0.5 + .
\emph{1.75}
\end{alltt}
Apart from contaigion, there are two ways of obtaining a floating point result from a computation; the word \texttt{>float ( n -{}- f)} converts a rational number into its floating point approximation, and the word \texttt{/f ( x y -{}- x/y)} returns the floating point approximation of a quotient of two numbers.
\begin{alltt}
7 4 / >float .
\emph{1.75}
7 4 /f .
\emph{1.75}
\end{alltt}
Indeed, the word \texttt{/f} could be defined as follows:
\begin{alltt}
: /f / >float ;
\end{alltt}
However, the actual definition is slightly more efficient, since it computes the floating point result directly.
\subsection{Complex numbers}
Just like we had to widen the integers to the rationals in order to divide, we have to widen the real numbers to the set of \emph{complex numbers} to solve certain kinds of equations. For example, the equation $x^2+1=0$ has no solution for real $x$, because there is no real number that is a square root of -1. This is so because the real numbers are not \emph{algebraically complete}.
Complex numbers, however, are algebraically complete, and Factor will find one solution to this equation\footnote{The other, of course being \texttt{\#\{ 0 -1 \}}.}:
\begin{alltt}
-1 sqrt .
\emph{\#\{ 0 1 \}}
\end{alltt}
The literal syntax for a complex number is \texttt{\#\{ re im \}}, where \texttt{re} is the real part and \texttt{im} is the imaginary part. For example, the literal \texttt{\#\{ 1/2 1/3 \}} corresponds to the complex number $1/2+1/3i$.
The words \texttt{i} an \texttt{-i} push the literals \texttt{\#\{ 0 1 \}} and \texttt{\#\{ 0 -1 \}}, respectively.
The predicate word \texttt{complex?} tests if the top of the stack is a complex number. Note that unlike math, where all real numbers are also complex numbers, Factor only considers a number to be a complex number if its imaginary part is non-zero.
Complex numbers can be deconstructed into their real and imaginary components using the \texttt{real} and \texttt{imaginary} words. Both components can be pushed at once using the word \texttt{>rect ( z -{}- re im )}.
A complex number can be constructed from a real and imaginary component on the stack using the word \texttt{rect> ( re im -{}- z )}.
\begin{alltt}
1/3 5 rect> .
\emph{\#\{ 1/3 5 \}}
\end{alltt}
Complex numbers are stored in \emph{rectangular form} as a real/imaginary component pair (this is where the names \texttt{>rect} and \texttt{rect>} come from). An alternative complex number representation is \emph{polar form}, consisting of an absolute value and argument. The absolute value and argument can be computed using the words \texttt{abs} and \texttt{arg}, and both can be pushed at once using \texttt{>polar ( z -{}- abs arg )}.
\begin{alltt}
5.3 abs .
\emph{5.3}
i arg .
\emph{1.570796326794897}
\#\{ 4 5 \} >polar .s
\emph{\{ 6.403124237432849 0.8960553845713439 \}}
\end{alltt}
A new complex number can be created from an absolute value and argument using \texttt{polar> ( abs arg -{}- z )}.
\begin{alltt}
1 pi polar> .
\emph{\#\{ -1.0 1.224606353822377e-16 \}}
\end{alltt}
\subsection{Transcedential functions}
The \texttt{math} vocabulary provides a rich library of mathematical functions that covers exponentiation, logarithms, trigonometry, and hyperbolic functions. All functions accept and return complex number arguments where appropriate. These functions all return floating point values, or complex numbers whose real and imaginary components are floating point values.
\texttt{\^ ( x y -- x\^y )} raises \texttt{x} to the power of \texttt{y}. In the cases of \texttt{y} being equal to $1/2$, -1, or 2, respectively, the words \texttt{sqrt}, \texttt{recip} and \texttt{sq} can be used instead.
\begin{alltt}
2 4 \^ .
\emph{16.0}
i i \^ .
\emph{0.2078795763507619}
\end{alltt}
\texttt{exp ( x -- e\^x )} raises the number $e$ to a specified power. The number $e$ can be pushed on the stack with the \texttt{e} word, so \texttt{exp} could have been defined as follows:
\begin{alltt}
: exp ( x -- e\^x ) e swap \^ ;
\end{alltt}
However, it is actually defined otherwise, for efficiency.\footnote{In fact, the word \texttt{\^} is actually defined in terms of \texttt{exp}, to correctly handle complex number arguments.}
\texttt{log ( x -- y )} computes the natural (base $e$) logarithm. This is the inverse of the \texttt{exp} function.
\begin{alltt}
-1 log .
\emph{\#\{ 0.0 3.141592653589793 \}}
e log .
\emph{1.0}
\end{alltt}
\texttt{sin ( x -- y )}, \texttt{cos ( x -- y )} and \texttt{tan ( x -- y )} are the familiar trigonometric functions, and \texttt{asin ( x -- y )}, \texttt{acos ( x -- y )} and \texttt{atan ( x -- y )} are their inverses.
The reciprocals of the sine, cosine and tangent are defined as \texttt{sec}, \texttt{cosec} and \texttt{cot}, respectively. Their inverses are \texttt{asec}, \texttt{acosec} and \texttt{acot}.
\texttt{sinh ( x -- y )}, \texttt{cosh ( x -- y )} and \texttt{tanh ( x -- y )} are the hyperbolic functions, and \texttt{asinh ( x -- y )}, \texttt{acosh ( x -- y )} and \texttt{atanh ( x -- y )} are their inverses.
Similarly, the reciprocals of the hyperbolic functions are defined as \texttt{sech}, \texttt{cosech} and \texttt{coth}, respectively. Their inverses are \texttt{asech}, \texttt{acosech} and \texttt{acoth}.
\subsection{Modular arithmetic}
In addition to the standard division operator \texttt{/}, there are a few related functions that are useful when working with integers.
\texttt{/i ( x y -{}- x\%y )} performs a truncating integer division. It could have been defined as follows:
\begin{alltt}
: /i / >integer ;
\end{alltt}
However, the actual definition is a bit more efficient than that.
\texttt{mod ( x y -{}- x\%y )} computes the remainder of dividing \texttt{x} by \texttt{y}. If the result is 0, then \texttt{x} is a multiple of \texttt{y}.
\texttt{/mod ( x y -{}- x/y x\%y )} pushes both the quotient and remainder.
\begin{alltt}
100 3 mod .
\emph{1}
-546 34 mod .
\emph{-2}
\end{alltt}
\texttt{gcd ( x y -- z )} pushes the greatest common divisor of two integers; that is, a common factor, or alternatively, the largest number that both integers could be divided by and still yield integers as results. This word is used behind the scenes to reduce rational numbers to lowest terms when doing ratio arithmetic.
\subsection{Bitwise operations}
There are two ways of looking at an integer -- as a mathematical entity, or as a string of bits. The latter representation faciliates the so-called \emph{bitwise operations}.
\texttt{bitand ( x y -{}- x\&y )} returns a new integer where each bit is set if and only if the corresponding bit is set in both $x$ and $y$. If you're considering an integer as a sequence of bit flags, taking the bitwise-and with a mask switches off all flags that are not explicitly set in the mask.
\begin{alltt}
BIN: 101 BIN: 10 bitand >bin print
\emph{0}
BIN: 110 BIN: 10 bitand >bin print
\emph{10}
\end{alltt}
\texttt{bitor ( x y -{}- x|y )} returns a new integer where each bit is set if and only if the corresponding bit is set in at least one of $x$ or $y$. If you're considering an integer as a sequence of bit flags, taking the bitwise-or with a mask switches on all flags that are set in the mask.
\begin{alltt}
BIN: 101 BIN: 10 bitor >bin print
\emph{111}
BIN: 110 BIN: 10 bitor >bin print
\emph{110}
\end{alltt}
\texttt{bitxor ( x y -{}- x\^y )} returns a new integer where each bit is set if and only if the corresponding bit is set in exactly one of $x$ or $y$. If you're considering an integer as a sequence of bit flags, taking the bitwise-xor with a mask toggles on all flags that are set in the mask.
\begin{alltt}
BIN: 101 BIN: 10 bitxor >bin print
\emph{111}
BIN: 110 BIN: 10 bitxor >bin print
\emph{100}
\end{alltt}
\texttt{shift ( x n -{}- y )} returns a new integer consisting of the bits of the first integer, shifted to the left by $n$ positions. If $n$ is negative, the bits are shifted to the right instead, and bits that ``fall off'' are discarded.
\begin{alltt}
BIN: 101 5 shift >bin print
\emph{10100000}
BIN: 11111 -2 shift >bin print
\emph{111}
\end{alltt}
The attentive reader will notice that shifting to the left is equivalent to multiplying by a power of two, and shifting to the right is equivalent to performing a truncating division by a power of two.
The word \texttt{=} is found in the \texttt{kernel} vocabulary, and the words \texttt{2dup} and \texttt{2drop} are found in the \texttt{stack} vocabulary. Since \texttt{=}
consumes both its parameters, we must first duplicate them with \texttt{2dup}. The word \texttt{correct} does not need to do anything with these two numbers, so they are popped off the stack using \texttt{2drop}. Try evaluating the following
A \emph{proper list} is a set of cons cells linked by their cdr, where the last cons cell has a cdr set to \texttt{f}. Also, the object \texttt{f} by itself
It is worth mentioning a few words closely related to and defined in terms of \texttt{cons}, \texttt{car} and \texttt{cdr}.
\texttt{swons ( cdr car -{}- cons )} constructs a cons cell, with the argument order reversed. Usually, it is considered bad practice to define two words that only differ by parameter order, however cons cells are constructed about equally frequently with both orders. Of course, \texttt{swons} is defined as follows:
\begin{alltt}
: swons swap cons ;
\end{alltt}
\texttt{uncons ( cons -{}- car cdr )} pushes both constituents of a cons cell. It is defined as thus:
\begin{alltt}
: uncons dup car swap cdr ;
\end{alltt}
\texttt{unswons ( cons -{}- cdr car)} is just a swapped version of \texttt{uncons}. It is defined as thus:
while \texttt{add} has to copy the entire list first! If you need to add to the end of a sequence frequently, consider either using a vector, or adding to the beginning of a list and reversing the list when done. For information about lists, see \ref{sub:Vectors}.
The list construction words provide an alternative way to build up a list. Instead of passing a partial list around on the stack as it is built, they store the partial list in a variable. This reduces the number
A pair of combinators for iterating over vectors are provided in the \texttt{vectors} vocabulary. The first is the \texttt{vector-each} word that does nothing other than applying a quotation to each element. The second is the \texttt{vector-map} word that also collects the return values of the quotation into a new vector.
\texttt{vector-each ( vector quot -{}- )} pushes each element of the vector in turn, and executes the quotation. The quotation should have a stack effect of \texttt{( obj -- )}. The vector and the quotation are not on the stack when the quotation is executed. This allows the quotation to use values below the vector for accumilation and so on.
The \texttt{stack>list} word makes use of \texttt{vector-each} to construct a list containing all elements of a given vector, in reverse order. In fact, its definition looks exactly like that of \texttt{reverse} except the \texttt{vector-each} combinator is used in place of \texttt{each}:
The \texttt{vector>list} word is defined as first creating a list of all elements in the vector in reverse order using \texttt{stack>list}, and then reversing this list:
\texttt{vector-map ( vector quot -{}- str )} is similar to \texttt{vector-each}, except after each iteration the return value of the quotation is collected into a new vector. The quotation should have a stack effect of \texttt{( obj -- obj )}.
The \texttt{clone-vector} word is implemented as a degenerate case of \texttt{vector-map} -- the elements of the original vector are copied into a new vector without any modification:
The string construction words provide an alternative way to build up a string. Instead of passing a string buffer around on the stack, they store the string buffer in a variable. This reduces the number
Compare the following two examples -- both define a word that concatenates together all elements of a list of strings. The first one uses a string buffer stored on the stack, the second uses string construction words:
The scope created by \texttt{<\%} and \texttt{\%>} is \emph{dynamic}; that is, all code executed between two words is part of the scope. This allows the call to \texttt{\%} to occur in a nested word. For example, here is a pair of definitions that turn an association list of strings into a string of the form \texttt{key1=value1 key2=value2 ...}:
A pair of combinators for iterating over strings are provided in the \texttt{strings} vocabulary. The first is the \texttt{str-each} word that does nothing other than applying a quotation to each character. The second is the \texttt{str-map} word that also collects the return values of the quotation into a new string.
\texttt{str-each ( str quot -{}- )} pushes each character of the string in turn, and executes the quotation. The quotation should have a stack effect of \texttt{( ch -{}- )}. The string and the quotation are not on the stack when the quotation is executed. This allows the quotation to use values below the string for accumilation and so on. The following example counts the number of occurrences of the letter ``a'' in a string:
\texttt{str-map ( str quot -{}- str )} is similar to \texttt{str-each}, except after each iteration the return value of the quotation is collected into a new string. The quotation should have a stack effect of \texttt{( ch -- str/ch )}. The following example replaces all occurrences of the space character in the string with \texttt{+}:
For the second practical example, we will code a small program that tracks how long you spend working on tasks. It will provide two primary functions, one for adding a new task and measuring how long you spend working on it, and another to print out the timesheet. A typical interaction looks like this:
Enter a letter between ( ) to execute that action.}
a
\emph{Start work on the task now. Press ENTER when done.
Please enter a description:
Working on the Factor HTTP server
(E)xit
(A)dd entry
(P)rint timesheet
Enter a letter between ( ) to execute that action.}
a
\emph{Start work on the task now. Press ENTER when done.
Please enter a description:}
Writing a kick-ass web app
\emph{
(E)xit
(A)dd entry
(P)rint timesheet
Enter a letter between ( ) to execute that action.}
p
\emph{TIMESHEET:
Working on the Factor HTTP server 0:25
Writing a kick-ass web app 1:03
(E)xit
(A)dd entry
(P)rint timesheet
Enter a letter between ( ) to execute that action.}
x
\end{alltt}
Once you have finished working your way through this tutorial, you might want to try extending the program -- for example, it could print the total hours, prompt for an hourly rate, then print the amount of money that should be billed.
Enter a letter between ( ) to execute that action.
\end{alltt}
We will represent the menu as an association list. Recall that an association list is a list of pairs, where the car of each pair is a key, and the cdr is a value. Our keys will literally be keyboard keys (``e'', ``a'' and ``p''), and the values will themselves be pairs consisting of a menu item label and a quotation.
The first word we will code is \texttt{print-menu}. It takes an association list, and prints the second element of each pair's value. Note that \texttt{terpri} simply prints a blank line:
\begin{alltt}
: print-menu ( menu -{}- )
terpri {[} cdr car print {]} each terpri
"Enter a letter between ( ) to execute that action." print ;
\end{alltt}
You can test \texttt{print-menu} with a short association list:
Enter a letter between ( ) to execute that action.}
\end{alltt}
The next step is to write a \texttt{menu-prompt} word that takes the same association list, reads a line of input from the keyboard, and executes the quotation associated with that line. Recall that the \texttt{assoc} word returns \texttt{f} if the specified key could not be found in the association list. The below definition makes use of a conditional to signal an error in that case:
\begin{alltt}
: menu-prompt ( menu -{}- )
read swap assoc dup {[}
cdr call
{]}{[}
"Invalid input: " swap unparse cat2 throw
{]} ifte ;
\end{alltt}
Try applying the new \texttt{menu-prompt} word to the association list we used to test \texttt{print-menu}. You should verify that entering \texttt{x} causes the quotation \texttt{{[} 2 2 + . {]}} to be executed:
Finally, we want a \texttt{menu} word that first prints a menu, then prompts for and acts on input:
\begin{alltt}
: menu ( menu -{}- )
dup print-menu menu-prompt ;
\end{alltt}
Considering the stack effects of \texttt{print-menu} and \texttt{menu-prompt}, it should be obvious why the \texttt{dup} is needed.
\subsection{Finishing off}
We now need a \texttt{main-menu} word. It takes the timesheet vector from the stack, and recursively calls itself until the user requests that the timesheet application exits:
Note that unless the first option is selected, the timesheet vector is eventually passed into the recursive \texttt{main-menu} call.
All that remains now is the ``main word'' that runs the program with an empty timesheet vector. Note that the initial capacity of the vector is 10 elements, however this is not a limit -- adding more than 10 elements will grow the vector:
The previously-mentioned \texttt{=} word in the \texttt{kernel} vocabulary, as well as the \texttt{assoc}, \texttt{contains} and \texttt{unique} words in the \texttt{lists} vocabulary all rely on object equality as part of their operation.
What does it mean for two objects to be ``equal''? In actual fact, there are two ways of comparing objects. Two object references can be compared for \emph{identity} using the \texttt{eq? ( obj obj -{}- ? )} word. This only returns true if both references point to the same object. A weaker form of comparison is the \texttt{= ( obj obj -{}- ? )} word, which checks if two objects ``have the same shape''.
If two objects are \texttt{eq?}, they will also be \texttt{=}.
For example, two literal objects with the same printed representation are as a general rule not always \texttt{eq?}, however they are \texttt{=}:
\begin{alltt}
{[} 1 2 3 {]}{[} 1 2 3 {]} eq? .
\emph{f}
{[} 1 2 3 {]}{[} 1 2 3 {]} = .
\emph{t}
\end{alltt}
On the other hand, duplicating an object reference on the stack using \texttt{dup} or similar, will give two references which are \texttt{eq?}:
\begin{alltt}
"Hello" dup eq? .
\emph{t}
\end{alltt}
An object can be cloned using \texttt{clone ( obj -{}- obj )}. The clone will no longer be \texttt{eq?} to the original (unless the original is immutable, in which case cloning is a no-op); however clones are always \texttt{=}.
A hashtable, much like an association list, stores key/value pairs, and offers lookup by key. However, whereas an association list must be searched linearly to locate keys, a hashtable uses a more sophisticated method. Key/value pairs are sorted into \emph{buckets} using a \emph{hash function}. If two objects are equal, then they must have the same hash code; but not necessarily vice versa. To look up the value associated with a key, only the bucket corresponding to the key has to be searched. A hashtable is simply a vector of buckets, where each bucket is an association list.
\texttt{<hashtable> ( capacity -{}- hash )} creates a new hashtable with the specified number of buckets. A hashtable with one bucket is basically an association list. Right now, a ``large enough'' capacity must be specified, and performance degrades if there are too many key/value pairs per bucket. In a future implementation, hashtables will grow as needed as the number of key/value pairs increases.
\texttt{hash ( key hash -{}- value )} looks up the value associated with a key in the hashtable. Pushes \texttt{f} if no pair with this key is present. Note that \texttt{hash} cannot differentiate between a key that is not present at all, or a key with a value of \texttt{f}.
\texttt{hash* ( key hash -{}- {[} key | value {]})} looks for
a pair with this key, and pushes the pair itself. Unlike \texttt{hash},
\texttt{hash{*}} returns different values in the cases of a value
set to \texttt{f}, or an undefined value.
\texttt{set-hash ( value key hash -{}- )} stores a key/value pair in a hashtable.
Notice that until now, all the code except a handful of examples has only used the stack for storage. You can also use variables to store temporary data, much like in other languages, however their use is not so prevalent. This is not a coincidence -- Fator was designed this way, and mastery of the stack is essential. Using variables where the stack is more appropriate leads to ugly, unreusable code.
Variables are typically used for longer-term storage of data, and for temporary storage of objects that are being constructed, where using the stack would be ackward. Another use for variables is compound data structures, realized as nested namespaces of variables. This concept should be instantly familiar to anybody who's used an object-oriented programming language.
The words \texttt{get ( name -{}- value )} and \texttt{set ( value name -{}- )} retreive and store variable values, respectively. For example:
So far, we have seen what we called ``the stack'' store intermediate values between computations. In fact Factor maintains a number of other stacks, and the formal name for the stack we've been dealing with so far is the \emph{data stack}.
Another stack is the \emph{call stack}. When a colon definition is invoked, the position within the current colon definition is pushed on the stack. This ensures that calling words return to the caller, just as in any other language with subroutines.\footnote{Factor supports a variety of structures for implementing non-local word exits, such as exceptions, co-routines, continuations, and so on. They all rely on manipulating the call stack and are described in later sections.}
The \emph{name stack} is the focus of this section. The \texttt{bind} combinator creates dynamic scope by pushing and popping namespaces on the name stack. Its definition is simpler than one would expect:
\begin{alltt}
: bind ( namespace quot -- )
swap >n call n> drop ;
\end{alltt}
The words \texttt{>n} and \texttt{n>} push and pop the name stack, respectively. Observe the stack flow in the definition of \texttt{bind}; the namespace goes on the name stack, the quotation is called, and the name space is popped and discarded.
The name stack is really just a vector. The words \texttt{>n} and \texttt{n>} are implemented as follows:
Recall that code quotations are in fact just linked lists. Factor code is data, and vice versa. Essentially, the interpreter iterates through code quotations, pushing literals and executing words. When a word is executed, one of two things happen -- either the word has a colon definition, and the interpreter is invoked recursively on the definition, or the word is primitive, and it is executed by the underlying virtual machine. A word is itself a first-class object.
It is the job of the parser to transform source code denoting literals and words into their internal representations. This is done using a vocabulary of \emph{parsing words}. The prettyprinter does the converse, by printing out data structures in a parsable form (both to humans and Factor). Because code is data, text representation of source code doubles as a way to serialize almost any Factor object.
What happened here? Instead of being executed, a ``naked'', unquoted word was pushed on the stack. The predicate \texttt{word? ( obj -{}- ? )} from the \texttt{words} vocabulary tests if the top of the stack is a word. Another way to get a word on the stack is to do a vocabulary search using a word name and a list of vocabularies to search in:
\begin{alltt}
"car" {[} "lists" {]} search .s
\emph{\{ car \}}
\end{alltt}
The \texttt{search} word will push \texttt{f} if the word is not defined. A new word can be created in a specified vocabulary explicitly:
\begin{alltt}
"start-server" "user" create .s
\emph{\{ start-server \}}
\end{alltt}
Two words are only ever equal under the \texttt{=} operator if they identify the same underlying object. Word objects are composed of three slots, named as follows.
\begin{tabular}{|r|l|}
\hline
Slot&
Description\tabularnewline
\hline
\hline
Primitive&
A number identifying a virtual machine operation.\tabularnewline
\hline
Parameter&
An object parameter for the virtual machine operation.\tabularnewline
\hline
Property list&
An association list of name/value pairs.\tabularnewline
\hline
\end{tabular}
If the primitive number is set to 1, the word is a colon definition and the parameter must be a quotation. Any other primitive number denotes a function of the virtual machine, and the parameter is ignored. Do not rely on primitive numbers in your code, instead use the \texttt{compound? ( obj -{}- ? )} and \texttt{primitive? ( obj -{}- ? )} predicates.
The word \texttt{define ( word quot -{}- )} defines a word to have the specified colon definition. Note that \texttt{create} and \texttt{define} perform an action somewhat analagous to the \texttt{: ... ;} notation for colon definitions, except at parse time rather than run time.
\subsection{The prettyprinter}
We've already seen the word \texttt{.} which prints the top of the stack in a form that may be read back in. The word \texttt{prettyprint} is similar, except the output is in an indented, multiple-line format. Both words are in the \texttt{prettyprint} vocabulary. Here is an example: