1956 lines
69 KiB
TeX
1956 lines
69 KiB
TeX
\documentclass[english]{article}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage[latin1]{inputenc}
|
|
\usepackage{alltt}
|
|
\pagestyle{headings}
|
|
\setcounter{tocdepth}{2}
|
|
\setlength\parskip{\medskipamount}
|
|
\setlength\parindent{0pt}
|
|
|
|
\makeatletter
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
|
|
%% Because html converters don't know tabularnewline
|
|
\providecommand{\tabularnewline}{\\}
|
|
|
|
\usepackage{babel}
|
|
\makeatother
|
|
\begin{document}
|
|
|
|
\title{Factor Developer's Guide}
|
|
|
|
|
|
\author{Slava Pestov}
|
|
|
|
\maketitle
|
|
\tableofcontents{}
|
|
|
|
|
|
\newpage
|
|
\section*{Introduction}
|
|
|
|
Factor is an imperative programming language with functional and object-oriented
|
|
influences. Its primary focus is the development of web-based server-side
|
|
applications. Factor is run by a virtual machine that provides
|
|
garbage collection and prohibits pointer arithmetic.%
|
|
\footnote{Two releases of Factor are available -- a virtual machine written
|
|
in C, and an interpreter written in Java that runs on the Java virtual
|
|
machine. This guide targets the C version of Factor.%
|
|
}
|
|
|
|
Factor borrows heavily from Forth, Joy and Lisp. From Forth it inherits
|
|
a flexible syntax defined in terms of ``parsing words'' and an
|
|
execution model based on a data stack and call stack. From Joy and
|
|
Lisp it inherits a virtual machine prohibiting direct pointer arithmetic,
|
|
and the use of ``cons cells'' to represent code and data structure.
|
|
|
|
|
|
\section{Fundamentals}
|
|
|
|
A ``word'' is the main unit of program organization
|
|
in Factor -- it corresponds to a ``function'', ``procedure''
|
|
or ``method'' in other languages.
|
|
|
|
When code examples are given, the input is in a roman font, and any
|
|
output from the interpreter is in italics:
|
|
|
|
\begin{alltt}
|
|
"Hello, world!" print
|
|
\emph{Hello, world!}
|
|
\end{alltt}
|
|
|
|
\subsection{The stack}
|
|
|
|
The stack is used to exchange data between words. When a number is
|
|
executed, it is pushed on the stack. When a word is executed, it receives
|
|
input parameters by removing successive elements from the top of the
|
|
stack. Results are then pushed back to the top of the stack.
|
|
|
|
The word \texttt{.s} prints the contents of the stack, leaving the
|
|
contents of the stack unaffected. The top of the stack is the rightmost
|
|
element in the printout:
|
|
|
|
\begin{alltt}
|
|
2 3 .s
|
|
\emph{\{ 2 3 \}}
|
|
\end{alltt}
|
|
|
|
The word \texttt{.} removes the object at the top of the stack, and
|
|
prints it:
|
|
|
|
\begin{alltt}
|
|
1 2 3 . . .
|
|
\emph{3}
|
|
\emph{2}
|
|
\emph{1}
|
|
\end{alltt}
|
|
|
|
The word \texttt{clear} removes all entries from the stack. It should only ever be used interactively, not from a definition!
|
|
|
|
\begin{alltt}
|
|
"hey ho" "merry christmas" .s
|
|
\emph{\{ "hey ho" "merry christmas" \}}
|
|
clear .s
|
|
\emph{\{ \}}
|
|
\end{alltt}
|
|
|
|
The usual arithmetic operators \texttt{+ - {*} /} all take two parameters
|
|
from the stack, and push one result back. Where the order of operands
|
|
matters (\texttt{-} and \texttt{/}), the operands are taken in the natural order. For example:
|
|
|
|
\begin{alltt}
|
|
10 17 + .
|
|
\emph{27}
|
|
111 234 - .
|
|
\emph{-123}
|
|
333 3 / .
|
|
\emph{111}
|
|
\end{alltt}
|
|
|
|
This type of arithmetic is called \emph{postfix}, because the operator
|
|
follows the operands. Contrast this with \emph{infix} notation used
|
|
in many other languages, so-called because the operator is in-between
|
|
the two operands.
|
|
|
|
More complicated infix expressions can be translated into postfix
|
|
by translating the inner-most parts first. Grouping parentheses are
|
|
never necessary:
|
|
|
|
\begin{alltt}
|
|
! Postfix equivalent of (2 + 3) {*} 6
|
|
2 3 + 6 {*}
|
|
\emph{30}
|
|
! Postfix equivalent of 2 + (3 {*} 6)
|
|
2 3 6 {*} +
|
|
\emph{20}
|
|
\end{alltt}
|
|
|
|
\subsection{Factoring}
|
|
|
|
New words can be defined in terms of existing words using the \emph{colon
|
|
definition} syntax:
|
|
|
|
\begin{alltt}
|
|
: \emph{name} ( \emph{inputs} -{}- \emph{outputs} )
|
|
! \emph{Description}
|
|
\emph{factors ...} ;
|
|
\end{alltt}
|
|
|
|
When the new word is executed, each one of its factors gets executed,
|
|
in turn.The stack effect comment delimited by \texttt{(} and \texttt{)},
|
|
as well as the documentation comment starting with \texttt{!} are
|
|
both optional, and can be placed anywhere in the source code, not
|
|
just in colon definitions. The interpreter ignores comments -- don't you.
|
|
|
|
Note that in a source file, a word definition can span multiple lines.
|
|
However, the interactive interpreter expects each line of input to
|
|
be ``complete'', so colon definitions that are input interactively must contain line breaks.
|
|
|
|
For example, say we are designing some aircraft
|
|
navigation software. Suppose we need a word that takes the flight time, the aircraft
|
|
velocity, and the tailwind velocity, and returns the distance travelled.
|
|
If the parameters are given on the stack in that order, all we do
|
|
is add the top two elements (aircraft velocity, tailwind velocity)
|
|
and multiply it by the element underneath (flight time). So the definition
|
|
looks like this:
|
|
|
|
\begin{alltt}
|
|
: distance ( time aircraft tailwind -{}- distance ) + {*} ;
|
|
2 900 36 distance .
|
|
\emph{1872}
|
|
\end{alltt}
|
|
|
|
Note that we are not using any distance or time units here. To extend this example to work with units, first assume that internally, all distances are
|
|
in meters, and all time intervals are in seconds. We can define words
|
|
for converting from kilometers to meters, and hours and minutes to
|
|
seconds:
|
|
|
|
\begin{alltt}
|
|
: kilometers 1000 {*} ;
|
|
: minutes 60 {*} ;
|
|
: hours 60 {*} 60 {*} ;
|
|
2 kilometers .
|
|
\emph{2000}
|
|
10 minutes .
|
|
\emph{600}
|
|
2 hours .
|
|
\emph{7200}
|
|
\end{alltt}
|
|
|
|
The implementation of \texttt{km/hour} is a bit more complex -- to convert from kilometers per hour to our ``canonical'' meters per second, we have to first convert to kilometers per second, then divide this by the number of seconds in one hour to get the desired result:
|
|
|
|
\begin{alltt}
|
|
: km/hour kilometers 1 hours / ;
|
|
2 hours 900 km/hour 36 km/hour distance .
|
|
\emph{1872000}
|
|
\end{alltt}
|
|
|
|
\subsection{Stack effects}
|
|
|
|
A stack effect comment contains a description of inputs to the left
|
|
of \texttt{-{}-}, and a description of outputs to the right. As always,
|
|
the top of the stack is on the right side. Lets try writing a word
|
|
to compute the cube of a number.
|
|
|
|
Three numbers on the stack can be multiplied together using \texttt{{*}
|
|
{*}}:
|
|
|
|
\begin{alltt}
|
|
2 4 8 {*} {*} .
|
|
\emph{64}
|
|
\end{alltt}
|
|
However, the stack effect of \texttt{{*} {*}} is \texttt{( a b c -{}-
|
|
a{*}b{*}c )}. We would like to write a word that takes \emph{one} input
|
|
only. To achieve this, we need to be able to duplicate the top stack
|
|
element twice. As it happens, there is a word \texttt{dup ( x -{}-
|
|
x x )} for precisely this purpose. Now, we are able to define the
|
|
\texttt{cube} word:
|
|
|
|
\begin{alltt}
|
|
: cube dup dup {*} {*} ;
|
|
10 cube .
|
|
\emph{1000}
|
|
-2 cube .
|
|
\emph{-8}
|
|
\end{alltt}
|
|
It is quite often the case that we want to compose two factors in
|
|
a colon definition, but their stack effects don't {}``match up''.
|
|
|
|
There is a set of \emph{shuffle words} for solving precisely this
|
|
problem. These words are so-called because they simply rearrange stack
|
|
elements in some fashion, without modifying them in any way. Lets
|
|
take a look at the most frequently-used shuffle words:
|
|
|
|
\texttt{drop ( x -{}- )} Discard the top stack element. Used when
|
|
a word returns a value that is not needed.
|
|
|
|
\texttt{dup ( x -{}- x x )} Duplicate the top stack element. Used
|
|
when a value is required as input for more than one word.
|
|
|
|
\texttt{swap ( x y -{}- y x )} Swap top two stack elements. Used when
|
|
a word expects parameters in a different order.
|
|
|
|
\texttt{rot ( x y z -{}- y z x )} Rotate top three stack elements
|
|
to the left.
|
|
|
|
\texttt{-rot ( x y z -{}- z x y )} Rotate top three stack elements
|
|
to the right.
|
|
|
|
\texttt{over ( x y -{}- x y x )} Bring the second stack element {}``over''
|
|
the top element.
|
|
|
|
\texttt{nip ( x y -{}- y )} Remove the second stack element.
|
|
|
|
\texttt{tuck ( x y -{}- y x y )} Tuck the top stack element under
|
|
the second stack element.
|
|
|
|
\texttt{dupd ( x y -{}- x x y )} Duplicate the second stack element.
|
|
|
|
\texttt{swapd ( x y z -{}- y x z )} Swap the second and third stack elements.
|
|
|
|
\texttt{transp ( x y z -{}- z y x )} Swap the first and third stack elements.
|
|
|
|
\texttt{2drop ( x y -{}- )} Discard the top two stack elements.
|
|
|
|
\texttt{2dup ( x y -{}- x y x y )} Duplicate the top two stack elements. A frequent use for this word is when two values have to be compared using something like \texttt{=} or \texttt{<} before being passed to another word.
|
|
|
|
\texttt{2swap ( x y z t -{}- z t x y )} Swap the top two stack elements.
|
|
|
|
You should try all these words out and become familiar with them. Push some numbers on the stack,
|
|
execute a shuffle word, and look at how the stack contents was changed using
|
|
\texttt{.s}. Compare the stack contents with the stack effects above.
|
|
|
|
Note the order of the shuffle word descriptions above. The ones at
|
|
the top are used most often because they are easy to understand. The
|
|
more complex ones such as \texttt{rot} and \texttt{2swap} should be avoided unless absolutely necessary, because
|
|
they make the flow of data in a word definition harder to understand.
|
|
|
|
If you find yourself using too many shuffle words, or you're writing
|
|
a stack effect comment in the middle of a colon definition, it is
|
|
a good sign that the word should probably be factored into two or
|
|
more words. Each word should take at most a couple of sentences to describe. Effective factoring is like riding a bicycle -- once you ``get it'', it becomes second nature.
|
|
|
|
|
|
\subsection{Combinators}
|
|
|
|
A quotation a list of objects that can be executed. Words that execute quotations are called \emph{combinators}. Quotations are input
|
|
using the following syntax:
|
|
|
|
\begin{alltt}
|
|
{[} 2 3 + . {]}
|
|
\end{alltt}
|
|
When input, a quotation is not executed immediately -- rather, it
|
|
is pushed on the stack. Try evaluating the following:
|
|
|
|
\begin{alltt}
|
|
{[} 1 2 3 + {*} {]} .s
|
|
\emph{\{ {[} 1 2 3 + {*} {]} \}}
|
|
call .s
|
|
\emph{\{ 5 \}}
|
|
\end{alltt}
|
|
\texttt{call} \texttt{( quot -{}- )} executes the quotation at the
|
|
top of the stack. Using \texttt{call} with a literal quotation is
|
|
useless; writing out the elements of the quotation has the same effect.
|
|
However, the \texttt{call} combinator is a building block of more
|
|
powerful combinators, since quotations can be passed around arbitrarily
|
|
and even modified before being called.
|
|
|
|
\texttt{ifte} \texttt{( cond true false -{}- )} executes either the
|
|
\texttt{true} or \texttt{false} quotations, depending on the boolean
|
|
value of \texttt{cond}. In Factor, there is no real boolean data type
|
|
-- instead, a special object \texttt{f} is the only object with a
|
|
{}``false'' boolean value. Every other object is a boolean {}``true''.
|
|
The special object \texttt{t} is the {}``canonical'' truth value.
|
|
|
|
Here is an example of \texttt{ifte} usage:
|
|
|
|
\begin{alltt}
|
|
1 2 < {[} "1 is less than 2." print {]} {[} "bug!" print {]} ifte
|
|
\end{alltt}
|
|
Compare the order of parameters here with the order of parameters in
|
|
the stack effect of \texttt{ifte}.
|
|
|
|
That the stack effects of the two \texttt{ifte} branches should be
|
|
the same. If they differ, the word becomes harder to document and
|
|
debug.
|
|
|
|
\texttt{times ( num quot -{}- )} executes a quotation a number of
|
|
times. It is good style to have the quotation always consume as many
|
|
values from the stack as it produces. This ensures the stack effect
|
|
of the entire \texttt{times} expression stays constant regardless
|
|
of the number of iterations.
|
|
|
|
More combinators will be introduced in later sections.
|
|
|
|
|
|
\subsection{Vocabularies}
|
|
|
|
When an expression is parsed, each token in turn is looked up in the dictionary. If there is no dictionary entry, the token is parsed as a number instead.
|
|
The dictionary of words is structured as a set of named \emph{vocabularies}. Each vocabulary is a list
|
|
of related words -- for example, the {}``lists''
|
|
vocabulary contains words for working with linked lists.
|
|
|
|
When a word is read by the parser, the \emph{vocabulary search path}
|
|
determines which vocabularies to search. In the interactive interpreter,
|
|
the default search path contains a large number of vocabularies. Contrast
|
|
this to the situation when a file is being parsed -- the search path
|
|
has a minimal set of vocabularies containing basic parsing words.%
|
|
\footnote{The rationale here is that the interactive interpreter should have
|
|
a large number of words available for convenience, whereas
|
|
source files should specify their external dependencies explicitly.%
|
|
}
|
|
|
|
New vocabularies are added to the search path using the \texttt{USE:}
|
|
parsing word. For example:
|
|
|
|
\begin{alltt}
|
|
{}``/home/slava/.factor-rc'' exists? .
|
|
\emph{ERROR: <interactive>:1: Undefined: exists?}
|
|
USE: streams
|
|
{}``/home/slava/.factor-rc'' exists? .
|
|
\emph{t}
|
|
\end{alltt}
|
|
How do you know which vocabulary contains a word? Vocabularies can
|
|
either be listed, and ``apropos'' searches can be performed:
|
|
|
|
\begin{alltt}
|
|
"init" words.
|
|
\emph{{[} ?run-file boot cli-arg cli-param init-environment}
|
|
\emph{init-gc init-interpreter init-scratchpad init-search-path}
|
|
\emph{init-stdio init-toplevel parse-command-line parse-switches}
|
|
\emph{run-files run-user-init stdin stdout {]} }
|
|
|
|
"map" apropos.
|
|
\emph{IN: lists}
|
|
\emph{map}
|
|
\emph{IN: strings}
|
|
\emph{str-map}
|
|
\emph{IN: vectors}
|
|
\emph{(vector-map)}
|
|
\emph{(vector-map-step)}
|
|
\emph{vector-map }
|
|
\end{alltt}
|
|
New words are defined in the \emph{input vocabulary}. The input vocabulary
|
|
can be changed at the interactive prompt, or in a source file, using
|
|
the \texttt{IN:} parsing word. For example:
|
|
|
|
\begin{alltt}
|
|
IN: music-database
|
|
: random-playlist ... ;
|
|
\end{alltt}
|
|
It is a convention (although it is not enforced by the parser) that
|
|
the \texttt{IN:} directive is the first statement in a source file,
|
|
and all \texttt{USE:} follow, before any other definitions.
|
|
|
|
Here is an example of a typical series of vocabulary declarations:
|
|
|
|
\begin{alltt}
|
|
IN: todo-list
|
|
USE: arithmetic
|
|
USE: kernel
|
|
USE: lists
|
|
USE: strings
|
|
\end{alltt}
|
|
|
|
\section{PRACTICAL: Numbers game}
|
|
|
|
In this section, basic input/output and flow control is introduced.
|
|
We construct a program that repeatedly prompts the user to guess a
|
|
number -- they are informed if their guess is correct, too low, or
|
|
too high. The game ends on a correct guess.
|
|
|
|
\begin{alltt}
|
|
numbers-game
|
|
\emph{I'm thinking of a number between 0 and 100.}
|
|
\emph{Enter your guess:} 25
|
|
\emph{Too low}
|
|
\emph{Enter your guess:} 38
|
|
\emph{Too high}
|
|
\emph{Enter your guess:} 31
|
|
\emph{Correct - you win!}
|
|
\end{alltt}
|
|
|
|
\subsection{Development methodology}
|
|
|
|
A typical Factor development session involves a text editor and Factor
|
|
interpreter running side by side. Instead of the edit/compile/run
|
|
cycle, the development process becomes an {}``edit cycle'' -- you
|
|
make some changes to the source file and reload it in the interpreter
|
|
using a command like this:
|
|
|
|
\begin{alltt}
|
|
"numbers-game.factor" run-file
|
|
\end{alltt}
|
|
Then the changes can be tested, either by hand, or using a test harness.
|
|
There is no need to compile anything, or to lose interpreter state
|
|
by restarting. Additionally, words with {}``throw-away'' definitions
|
|
that you do not intend to keep can also be entered directly at this
|
|
interpreter prompt.
|
|
|
|
Each word should do one useful task. New words can be defined in terms
|
|
of existing, already-tested words. You design a set of reusable words
|
|
that model the problem domain. Then, the problem is solved in terms
|
|
of a \emph{domain-specific vocabulary}. This is called \emph{bottom-up
|
|
design.}
|
|
|
|
The jEdit text editor makes Factor development much more pleasant.
|
|
The Factor plugin for jEdit provides an {}``integrated development
|
|
environment'' with many time-saving features. See the documentation
|
|
for the plugin itself for details.
|
|
|
|
|
|
\subsection{Getting started}
|
|
|
|
Start a text editor and create a file named \texttt{numbers-game.factor}.
|
|
|
|
Write a short comment at the top of the file. Two examples of commenting style supported by Factor:
|
|
|
|
\begin{alltt}
|
|
! Numbers game.
|
|
( The great numbers game )
|
|
\end{alltt}
|
|
|
|
It is always a good idea to comment your code. Try to write simple
|
|
code that does not need detailed comments to describe; similarly,
|
|
avoid redundant comments. These two principles are hard to quantify
|
|
in a concrete way, and will become more clear as your skills with
|
|
Factor increase.
|
|
|
|
We will be defining new words in the \texttt{numbers-game} vocabulary; add
|
|
an \texttt{IN:} statement at the top of the source file:
|
|
|
|
\begin{alltt}
|
|
IN: numbers-game
|
|
\end{alltt}
|
|
Also in order to be able to test the words, issue a \texttt{USE:}
|
|
statement in the interactive interpreter:
|
|
|
|
\begin{alltt}
|
|
USE: numbers-game
|
|
\end{alltt}
|
|
This section will develop the numbers game in an incremental fashion.
|
|
After each addition, issue a command like the following to load the
|
|
source file into the Factor interpreter:
|
|
|
|
\begin{alltt}
|
|
"numbers-game.factor" run-file
|
|
\end{alltt}
|
|
|
|
\subsection{Reading a number from the keyboard}
|
|
|
|
A fundamental operation required for the numbers game is to be able
|
|
to read a number from the keyboard. The \texttt{read} word \texttt{(
|
|
-{}- str )} reads a line of input and pushes it on the stack.
|
|
The \texttt{parse-number} word \texttt{( str -{}- n )} turns a decimal
|
|
string representation of an integer into the integer itself. These
|
|
two words can be combined into a single colon definition:
|
|
|
|
\begin{alltt}
|
|
: read-number ( -{}- n ) read parse-number ;
|
|
\end{alltt}
|
|
You should add this definition to the source file, and try loading
|
|
the file into the interpreter. As you will soon see, this raises an
|
|
error! The problem is that the two words \texttt{read} and \texttt{parse-number}
|
|
are not part of the default, minimal, vocabulary search path used
|
|
when reading files. The solution is to use \texttt{apropos.} to find
|
|
out which vocabularies contain those words, and add the appropriate
|
|
USE: statements to the source file:
|
|
|
|
\begin{alltt}
|
|
USE: parser
|
|
USE: stdio
|
|
\end{alltt}
|
|
After adding the above two statements, the file should now parse,
|
|
and testing should confirm that the \texttt{read-number} word works correctly.%
|
|
\footnote{There is the possibility of an invalid number being entered at the
|
|
keyboard. In this case, \texttt{parse-number} returns \texttt{f},
|
|
the boolean false value. For the sake of simplicity, we ignore this
|
|
case in the numbers game example. However, proper error handling is
|
|
an essential part of any large program and is covered later.%
|
|
}
|
|
|
|
|
|
\subsection{Printing some messages}
|
|
|
|
Now we need to make some words for printing various messages. They
|
|
are given here without further ado:
|
|
|
|
\begin{alltt}
|
|
: guess-banner
|
|
"I'm thinking of a number between 0 and 100." print ;
|
|
: guess-prompt "Enter your guess: " write ;
|
|
: too-high "Too high" print ;
|
|
: too-low "Too low" print ;
|
|
: correct "Correct - you win!" print ;
|
|
\end{alltt}
|
|
Note that in the above, stack effect comments are omitted, since they
|
|
are obvious from context. You should ensure the words work correctly
|
|
after loading the source file into the interpreter.
|
|
|
|
|
|
\subsection{Taking action based on a guess}
|
|
|
|
The next logical step is to write a word \texttt{judge-guess} that
|
|
takes the user's guess along with the actual number to be guessed,
|
|
and prints one of the messages \texttt{too-high}, \texttt{too-low},
|
|
or \texttt{correct}. This word will also push a boolean flag, indicating
|
|
if the game should continue or not -- in the case of a correct guess,
|
|
the game does not continue.
|
|
|
|
This description of judge-guess is a mouthful -- and it suggests that
|
|
it may be best to split it into two words. So the first word we write
|
|
handles the more specific case of an \emph{inexact} guess -- so it
|
|
prints either \texttt{too-low} or \texttt{too-high}.
|
|
|
|
\begin{alltt}
|
|
: inexact-guess ( actual guess -{}- )
|
|
< {[} too-high {]} {[} too-low {]} ifte ;
|
|
\end{alltt}
|
|
Note that the word gives incorrect output if the two parameters are
|
|
equal. However, it will never be called this way.
|
|
|
|
With this out of the way, the implementation of judge-guess is an
|
|
easy task to tackle. Using the words \texttt{inexact-guess}, \texttt{=},
|
|
and \texttt{2dup}, we can write:
|
|
|
|
\begin{alltt}
|
|
: judge-guess ( actual guess -{}- ? )
|
|
2dup = {[}
|
|
correct f
|
|
{]} {[}
|
|
inexact-guess t
|
|
{]} ifte ;
|
|
\end{alltt}
|
|
|
|
The word = is found in the \texttt{kernel} vocabulary, and the word 2dup is found in the \texttt{stack} vocabulary. Since \texttt{=}
|
|
consumes both its parameters, we must first duplicate them with \texttt{2dup} so that later they can be passed
|
|
to \texttt{correct} and \texttt{inexact-guess}. Try evaluating the following
|
|
in the interpreter to see what's going on:
|
|
|
|
\begin{alltt}
|
|
clear 1 2 2dup = .s
|
|
\emph{\{ 1 2 f \}}
|
|
clear 4 4 2dup = .s
|
|
\emph{\{ 4 4 t \}}
|
|
\end{alltt}
|
|
Test \texttt{judge-guess} with a few inputs:
|
|
|
|
\begin{alltt}
|
|
1 10 judge-guess .
|
|
\emph{Too low}
|
|
\emph{t}
|
|
89 43 judge-guess .
|
|
\emph{Too high}
|
|
\emph{t}
|
|
64 64 judge-guess .
|
|
\emph{Correct}
|
|
\emph{f}
|
|
\end{alltt}
|
|
|
|
\subsection{Generating random numbers}
|
|
|
|
The \texttt{random-int} word \texttt{( min max -{}- n )} pushes a
|
|
random number in a specified range. The range is inclusive, so both
|
|
the minimum and maximum indexes are candidate random numbers. Use
|
|
\texttt{apropos.} to determine that this word is in the \texttt{random}
|
|
vocabulary. For the purposes of this game, random numbers will be
|
|
in the range of 0 to 100, so we can define a word that generates a
|
|
random number in the range of 0 to 100:
|
|
|
|
\begin{alltt}
|
|
: number-to-guess ( -{}- n ) 0 100 random-int ;
|
|
\end{alltt}
|
|
Add the word definition to the source file, along with the appropriate
|
|
\texttt{USE:} statement. Load the source file in the interpreter,
|
|
and confirm that the word functions correctly, and that its stack
|
|
effect comment is accurate.
|
|
|
|
|
|
\subsection{The game loop}
|
|
|
|
The game loop consists of repeated calls to \texttt{guess-prompt},
|
|
\texttt{read-number} and \texttt{judge-guess}. If \texttt{judge-guess}
|
|
pushes \texttt{f}, the loop stops, otherwise it continues. This is
|
|
realized with a recursive implementation:
|
|
|
|
\begin{alltt}
|
|
: numbers-game-loop ( actual -{}- )
|
|
dup guess-prompt read-number judge-guess {[}
|
|
numbers-game-loop
|
|
{]} {[}
|
|
drop
|
|
{]} ifte ;
|
|
\end{alltt}
|
|
In Factor, tail-recursive words consume a bounded amount of call stack
|
|
space. This means you are free to pick recursion or iteration based
|
|
on their own merits when solving a problem. In many other languages,
|
|
the usefulness of recursion is severely limited by the lack of tail-recursive
|
|
call optimization.
|
|
|
|
|
|
\subsection{Finishing off}
|
|
|
|
The last task is to combine everything into the main \texttt{numbers-game}
|
|
word. This is easier than it seems:
|
|
|
|
\begin{alltt}
|
|
: numbers-game number-to-guess numbers-game-loop ;
|
|
\end{alltt}
|
|
Try it out! Simply invoke the \texttt{numbers-game} word in the interpreter.
|
|
It should work flawlessly, assuming you tested each component of this
|
|
design incrementally!
|
|
|
|
|
|
\subsection{The complete program}
|
|
|
|
\begin{verbatim}
|
|
! Numbers game example
|
|
|
|
IN: numbers-game
|
|
USE: arithmetic
|
|
USE: kernel
|
|
USE: parser
|
|
USE: random
|
|
USE: stdio
|
|
USE: stack
|
|
|
|
: read-number ( -- n ) read parse-number ;
|
|
|
|
: guess-banner
|
|
"I'm thinking of a number between 0 and 100." print ;
|
|
: guess-prompt "Enter your guess: " write ;
|
|
: too-high "Too high" print ;
|
|
: too-low "Too low" print ;
|
|
: correct "Correct - you win!" print ;
|
|
|
|
: inexact-guess ( actual guess -- )
|
|
< [ too-high ] [ too-low ] ifte ;
|
|
|
|
: judge-guess ( actual guess -- ? )
|
|
2dup = [
|
|
correct f
|
|
] [
|
|
inexact-guess t
|
|
] ifte ;
|
|
|
|
: number-to-guess ( -- n ) 0 100 random-int ;
|
|
|
|
: numbers-game-loop ( actual -- )
|
|
dup guess-prompt read-number judge-guess [
|
|
numbers-game-loop
|
|
] [
|
|
drop
|
|
] ifte ;
|
|
|
|
: numbers-game number-to-guess numbers-game-loop ;
|
|
\end{verbatim}
|
|
|
|
\section{Lists}
|
|
|
|
A list is composed of a set of pairs; each pair holds a list element,
|
|
and a reference to the next pair. Lists have the following literal
|
|
syntax:
|
|
|
|
\begin{alltt}
|
|
{[} "CEO" 5 "CFO" -4 f {]}
|
|
\end{alltt}
|
|
Before we continue, it is important to understand the role of data
|
|
types in Factor. Lets make a distinction between two categories of
|
|
data types:
|
|
|
|
\begin{itemize}
|
|
\item Representational type -- this refers to the form of the data in the
|
|
interpreter. Representational types include integers, strings, and
|
|
vectors. Representational types are checked at run time -- attempting
|
|
to multiply two strings, for example, will yield an error.
|
|
\item Intentional type -- this refers to the meaning of the data within
|
|
the problem domain. This could be a length measured in inches, or
|
|
a string naming a file, or a list of objects in a room in a game.
|
|
It is up to the programmer to check intentional types -- Factor won't
|
|
prevent you from adding two integers representing a distance and a
|
|
time, even though the result is meaningless.
|
|
\end{itemize}
|
|
|
|
\subsection{Cons cells}
|
|
|
|
It may surprise you that in Factor, \emph{lists are intentional types}.
|
|
This means that they are not an inherent feature of the interpreter;
|
|
rather, they are built from a simpler data type, the \emph{cons cell}.
|
|
|
|
A cons cell is an object that holds a reference to two other objects.
|
|
The order of the two objects matters -- the first is called the \emph{car},
|
|
the second is called the \emph{cdr}.
|
|
|
|
All words relating to cons cells and lists are found in the \texttt{lists}
|
|
vocabulary. The words \texttt{cons}, \texttt{car} and \texttt{cdr}%
|
|
\footnote{These infamous names originate from the Lisp language. Originally,
|
|
{}``Lisp'' stood for {}``List Processing''.%
|
|
} construct and deconstruct cons cells:
|
|
|
|
\begin{alltt}
|
|
1 2 cons .
|
|
\emph{{[} 1 | 2 {]}}
|
|
3 4 cons car .
|
|
\emph{3}
|
|
5 6 cons cdr .
|
|
\emph{6}
|
|
\end{alltt}
|
|
The output of the first expression suggests a literal syntax for cons
|
|
cells:
|
|
|
|
\begin{alltt}
|
|
{[} 10 | 20 {]} cdr .
|
|
\emph{20}
|
|
{[} "first" | {[} "second" | f {]} {]} car .
|
|
\emph{"first"}
|
|
{[} "first" | {[} "second" | f {]} {]} cdr car .
|
|
\emph{"second"}
|
|
\end{alltt}
|
|
The last two examples make it clear how nested cons cells represent
|
|
a list. Since this {}``nested cons cell'' syntax is extremely cumbersome,
|
|
the parser provides an easier way:
|
|
|
|
\begin{alltt}
|
|
{[} 1 2 3 4 {]} cdr cdr car .
|
|
\emph{3}
|
|
\end{alltt}
|
|
A \emph{proper list} is a set of cons cells linked by their cdr, where the last cons cell has a cdr set to \texttt{f}. Also, the object \texttt{f} by itself
|
|
is a proper list, and in fact it is equivalent to the empty list \texttt{{[}
|
|
{]}}. An \emph{improper list} is a set of cons cells that does not terminate with \texttt{f}. Improper lists are input with the following syntax:
|
|
|
|
\begin{verbatim}
|
|
[ 1 2 3 | 4 ]
|
|
\end{verbatim}
|
|
|
|
The \texttt{list?} word tests if the object at the top of the stack
|
|
is a proper list:
|
|
|
|
\begin{alltt}
|
|
"hello" list? .
|
|
\emph{f}
|
|
{[} "first" "second" | "third" {]} list? .
|
|
\emph{f}
|
|
{[} "first" "second" "third" {]} list? .
|
|
\emph{t}
|
|
\end{alltt}
|
|
|
|
It is worth mentioning a few words closely related to and defined in terms of \texttt{cons}, \texttt{car} and \texttt{cdr}.
|
|
|
|
\texttt{swons ( cdr car -{}- cons )} constructs a cons cell, with the argument order reversed. Usually, it is considered bad practice to define two words that only differ by parameter order, however cons cells are constructed about equally frequently with both orders. Of course, \texttt{swons} is defined as follows:
|
|
|
|
\begin{alltt}
|
|
: swons swap cons ;
|
|
\end{alltt}
|
|
|
|
\texttt{uncons ( cons -{}- car cdr )} pushes both constituents of a cons cell. It is defined as thus:
|
|
|
|
\begin{alltt}
|
|
: uncons dup car swap cdr ;
|
|
\end{alltt}
|
|
|
|
\texttt{unswons ( cons -{}- cdr car)} is just a swapped version of \texttt{uncons}. It is defined as thus:
|
|
|
|
\begin{alltt}
|
|
: unswons dup cdr swap car ;
|
|
\end{alltt}
|
|
|
|
\subsection{Working with lists}
|
|
|
|
Unless otherwise documented, list manipulation words expect proper
|
|
lists as arguments. Given an improper list, they will either raise
|
|
an error, or disregard the hanging cdr at the end of the list.
|
|
|
|
Also unless otherwise documented, list manipulation words return newly-created
|
|
lists only. The original parameters are not modified. This may seem
|
|
inefficient, however the absence of side effects makes code much easier
|
|
to test and debug.%
|
|
\footnote{Side effect-free code is the fundamental idea underlying functional
|
|
programming languages. While Factor allows side effects and is not
|
|
a functional programming language, for a lot of problems, coding in
|
|
a functional style gives the most maintainable and readable results.%
|
|
} Where performance is important, a set of {}``destructive'' words
|
|
is provided. They are documented in \ref{sub:Destructively-modifying-lists}.
|
|
|
|
\texttt{add ( list obj -{}- list )} Create a new list consisting of
|
|
the original list, and a new element added at the end:
|
|
|
|
\begin{alltt}
|
|
{[} 1 2 3 {]} 4 add .
|
|
\emph{{[} 1 2 3 4 {]}}
|
|
1 {[} 2 3 4 {]} cons .
|
|
\emph{{[} 1 2 3 4 {]}}
|
|
\end{alltt}
|
|
While \texttt{cons} and \texttt{add} appear to have similar effects,
|
|
they are quite different -- \texttt{cons} is a very cheap operation,
|
|
while \texttt{add} has to copy the entire list first! If you need to add to the end of a sequence frequently, consider either using a vector, or adding to the beginning of a list and reversing the list when done. For information about lists, see \ref{sub:Vectors}.
|
|
|
|
\texttt{append ( list list -{}- list )} Append two lists at the
|
|
top of the stack:
|
|
|
|
\begin{alltt}
|
|
{[} 1 2 3 {]} {[} 4 5 6 {]} append .
|
|
\emph{{[} 1 2 3 4 5 6 {]}}
|
|
{[} 1 2 3 {]} dup {[} 4 5 6 {]} append .s
|
|
\emph{\{ {[} 1 2 3 {]} {[} 1 2 3 4 5 6 {]} \}}
|
|
\end{alltt}
|
|
The first list is copied, and the cdr of its last cons cell is set
|
|
to point to the second list. The second example above shows that the original
|
|
parameter was not modified. Interestingly, if the second parameter
|
|
is not a proper list, \texttt{append} returns an improper list:
|
|
|
|
\begin{alltt}
|
|
{[} 1 2 3 {]} 4 append .
|
|
\emph{{[} 1 2 3 | 4 {]}}
|
|
\end{alltt}
|
|
\texttt{length ( list -{}- n )} Iterate down the cdr of the list until
|
|
it reaches \texttt{f}, counting the number of elements in the list:
|
|
|
|
\begin{alltt}
|
|
{[} {[} 1 2 {]} {[} 3 4 {]} 5 {]} length .
|
|
\emph{3}
|
|
{[} {[} {[} "Hey" {]} 5 {]} length .
|
|
\emph{2}
|
|
\end{alltt}
|
|
\texttt{nth ( index list -{}- obj )} Look up an element specified
|
|
by a zero-based index, by successively iterating down the cdr of the
|
|
list:
|
|
|
|
\begin{alltt}
|
|
1 {[} "Hamster" "Bagpipe" "Beam" {]} nth .
|
|
\emph{"Bagpipe"}
|
|
\end{alltt}
|
|
This word runs in linear time proportional to the list index. If you
|
|
need constant time lookups, use a vector instead.
|
|
|
|
\texttt{set-nth ( value index list -{}- list )} Create a new list,
|
|
identical to the original list except the element at the specified
|
|
index is replaced:
|
|
|
|
\begin{alltt}
|
|
{}``Done'' 1 {[} {}``Not started'' {}``Incomplete'' {]} set-nth .
|
|
|
|
\emph{{[} {}``Done'' {}``Incomplete'' {]}}
|
|
\end{alltt}
|
|
\texttt{remove ( obj list -{}- list )} Push a new list, with all occurrences
|
|
of the object removed. All other elements are in the same order:
|
|
|
|
\begin{alltt}
|
|
: australia- ( list -- list ) "Australia" swap remove ;
|
|
{[} "Canada" "New Zealand" "Australia" "Russia" {]} australia- .
|
|
\emph{{[} "Canada" "New Zealand" "Russia" {]}}
|
|
\end{alltt}
|
|
\texttt{remove-nth ( index list -{}- list )} Push a new list, with
|
|
an index removed:
|
|
|
|
\begin{alltt}
|
|
: remove-1 ( list -- list ) 1 swap remove-nth ;
|
|
{[} "Canada" "New Zealand" "Australia" "Russia" {]} remove-1 .
|
|
\emph{{[} "Canada" "Australia" "Russia" {]}}
|
|
\end{alltt}
|
|
\texttt{reverse ( list -{}- list )} Push a new list which has the
|
|
same elements as the original one, but in reverse order:
|
|
|
|
\begin{alltt}
|
|
{[} 4 3 2 1 {]} reverse .
|
|
\emph{{[} 1 2 3 4 {]}}
|
|
\end{alltt}
|
|
\texttt{contains ( obj list -{}- list )} Look for an occurrence of
|
|
an object in a list. The remainder of the list starting from the first
|
|
occurrence is returned. If the object does not occur in the list,
|
|
f is returned:
|
|
|
|
\begin{alltt}
|
|
: lived-in? ( country -{}- ? )
|
|
{[}
|
|
"Canada" "New Zealand" "Australia" "Russia"
|
|
{]} contains ;
|
|
"Australia" lived-in? .
|
|
\emph{{[} "Australia" "Russia" {]}}
|
|
"Pakistan" lived-in? .
|
|
\emph{f}
|
|
\end{alltt}
|
|
For now, assume {}``occurs'' means {}``contains an object that
|
|
looks like''. The issue of object equality is covered later.
|
|
|
|
\texttt{unique ( list -{}- list )} Return a new list with all duplicate
|
|
elements removed. This word executes in quadratic time, so should
|
|
not be used with large lists. For example:
|
|
|
|
\begin{alltt}
|
|
{[} 1 2 1 4 1 8 {]} unique .
|
|
\emph{{[} 1 2 4 8 {]}}
|
|
\end{alltt}
|
|
\texttt{unit ( obj -{}- list )} Make a list of one element:
|
|
|
|
\begin{alltt}
|
|
{}``Unit 18'' unit .
|
|
\emph{{[} {}``Unit 18'' {]}}
|
|
\end{alltt}
|
|
|
|
\subsection{Association lists}
|
|
|
|
An \emph{association list} is one where every element is a cons. The
|
|
car of each cons is a name, the cdr is a value. The literal notation
|
|
is suggestive:
|
|
|
|
\begin{alltt}
|
|
{[}
|
|
{[} "Jill" | "CEO" {]}
|
|
{[} "Jeff" | "manager" {]}
|
|
{[} "James" | "lowly web designer" {]}
|
|
{]}
|
|
\end{alltt}
|
|
\texttt{assoc? ( obj -{}- ? )} returns \texttt{t} if the object is
|
|
a list whose every element is a cons; otherwise it returns \texttt{f}.
|
|
|
|
\texttt{assoc ( name alist -{}- value )} looks for a pair with this
|
|
name in the list, and pushes the cdr of the pair. Pushes f if no pair
|
|
with this name is present. Note that \texttt{assoc} cannot differentiate between
|
|
a name that is not present at all, or a name with a value of \texttt{f}.
|
|
|
|
\texttt{assoc{*} ( name alist -{}- {[} name | value {]} )} looks for
|
|
a pair with this name, and pushes the pair itself. Unlike \texttt{assoc},
|
|
\texttt{assoc{*}} returns different values in the cases of a value
|
|
set to \texttt{f}, or an undefined value.
|
|
|
|
\texttt{set-assoc ( value name alist -{}- alist )} removes any existing
|
|
occurrence of a name from the list, and adds a new pair. This creates
|
|
a new list, the original is unaffected.
|
|
|
|
\texttt{acons ( value name alist -{}- alist )} is slightly faster
|
|
than \texttt{set-assoc} since it simply conses a new pair onto the
|
|
list. However, if used repeatedly, the list will grow to contain a
|
|
lot of {}``shadowed'' pairs.
|
|
|
|
Searching association lists incurs a linear time cost, so they should
|
|
only be used for small mappings -- a typical use is a mapping of half
|
|
a dozen entries or so, specified literally in source. Hashtables offer
|
|
better performance with larger mappings.
|
|
|
|
|
|
\subsection{List combinators}
|
|
|
|
In a traditional language such as C, every iteration or collection
|
|
must be written out as a loop, with setting up and updating of indexes,
|
|
etc. Factor on the other hand relies on combinators and quotations
|
|
to avoid duplicating these loop ``design patterns'' throughout
|
|
the code.
|
|
|
|
The simplest case is iterating through each element of a list, and
|
|
printing it or otherwise consuming it from the stack.
|
|
|
|
\texttt{each ( list quot -{}- )} pushes each element of the list in
|
|
turn, and executes the quotation. The list and quotation are not on
|
|
the stack when the quotation is executed. This allows a powerful idiom
|
|
where the quotation makes a copy of a value on the stack, and consumes
|
|
it along with the list element. In fact, this idiom works with all
|
|
well-designed combinators.%
|
|
\footnote{Later, you will learn how to apply it when designing your own combinators.%
|
|
}
|
|
|
|
The previously-mentioned \texttt{reverse} word is implemented using
|
|
\texttt{each}:
|
|
\begin{alltt}
|
|
: reverse ( list -- list ) {[} {]} swap {[} swons {]} each ;
|
|
\end{alltt}
|
|
To understand how it works, consider that each element of the original
|
|
list is consed onto the beginning of a new list, in turn. So the last
|
|
element of the original list ends up at the beginning of the new list.
|
|
|
|
\texttt{inject ( list quot -{}- list )} is similar to \texttt{each},
|
|
except after each iteration the return value of the quotation is collected into a new
|
|
list. The quotation must have stack effect
|
|
\texttt{( obj -{}- obj )} otherwise the combinator
|
|
will not function properly.
|
|
|
|
For example, suppose we have a list where each element stores the
|
|
quantity of a some nutrient in 100 grams of food; we would like to
|
|
find out the total nutrients contained in 300 grams:
|
|
|
|
\begin{alltt}
|
|
: multiply-each ( n list -{}- list )
|
|
{[} dupd {*} {]} inject nip ;
|
|
3 {[} 50 450 101 {]} multiply-each .
|
|
\emph{{[} 180 1350 303 {]}}
|
|
\end{alltt}
|
|
Note the use of \texttt{dupd} to preserve the value of \texttt{n} after each iteration, and the final \texttt{nip} to discard the value of \texttt{n}.
|
|
|
|
\texttt{subset ( list quot -{}- list )} produces a new list containing
|
|
some of the elements of the original list. Which elements to collect
|
|
is determined by the quotation -- the quotation is called with each
|
|
list element on the stack in turn, and those elements for which the
|
|
quotation does not return \texttt{f} are added to the new list. The
|
|
quotation must have stack effect \texttt{( obj -{}- ?~)}.
|
|
|
|
For example, lets construct a list of all numbers between 0 and 99
|
|
such that the sum of their digits is less than 10:
|
|
|
|
\begin{alltt}
|
|
: sum-of-digits ( n -{}- n ) 10 /mod + ;
|
|
100 count {[} sum-of-digits 10 < {]} subset .
|
|
\emph{{[} 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21}
|
|
\emph{22 23 24 25 26 27 30 31 32 33 34 35 36 40 41 42 43 44}
|
|
\emph{45 50 51 52 53 54 60 61 62 63 70 71 72 80 81 90 {]} }
|
|
\end{alltt}
|
|
\texttt{all? ( list quot -{}- ?~)} returns \texttt{t} if the quotation
|
|
returns \texttt{t} for all elements of the list, otherwise it returns
|
|
\texttt{f}. In other words, if \texttt{all?} returns \texttt{t}, then
|
|
\texttt{subset} applied to the same list and quotation would return
|
|
the entire list.%
|
|
\footnote{Barring any side effects which modify the execution of the quotation.
|
|
It is best to avoid side effects when using list combinators.%
|
|
}
|
|
|
|
For example, the implementation of \texttt{assoc?} uses \texttt{all?}:
|
|
|
|
\begin{alltt}
|
|
: assoc? ( list -{}- ?~)
|
|
dup list? {[} {[} cons? {]} all? {]} {[} drop f {]} ifte ;
|
|
\end{alltt}
|
|
|
|
\subsection{\label{sub:List-constructors}List constructors}
|
|
|
|
The list construction words provide an alternative way to build up a list. Instead of passing a partial list around on the stack as it is built, they store the partial list in a variable. This reduces the number
|
|
of stack elements that have to be juggled.
|
|
|
|
The word \texttt{{[}, ( -{}- )} begins list construction.
|
|
|
|
The word \texttt{, ( obj -{}- )} appends an object to the partial
|
|
list.
|
|
|
|
The word \texttt{,{]} ( -{}- list )} pushes the complete list.
|
|
|
|
While variables haven't been described yet, keep in mind that a new
|
|
scope is created between \texttt{{[},} and \texttt{,{]}}. This means
|
|
that list constructions can be nested, as long as in the end, the
|
|
number of \texttt{{[},} and \texttt{,{]}} balances out. There is no
|
|
requirement that \texttt{{[},} and \texttt{,{]}} appear in the same
|
|
word, however, debugging becomes prohibitively difficult when a list
|
|
construction begins in one word and ends with another.
|
|
|
|
Here is an example of list construction using this technique:
|
|
|
|
\begin{alltt}
|
|
{[}, 1 10 {[} 2 {*} dup , {]} times drop ,{]} .
|
|
\emph{{[} 2 4 8 16 32 64 128 256 512 1024 {]}}
|
|
\end{alltt}
|
|
|
|
\subsection{\label{sub:Destructively-modifying-lists}Destructively modifying lists}
|
|
|
|
All previously discussed list modification functions always returned
|
|
newly-allocated lists. Destructive list manipulation functions on
|
|
the other hand reuse the cons cells of their input lists, and hence
|
|
avoid memory allocation.
|
|
|
|
Only ever destructively change lists you do not intend to reuse again.
|
|
You should not rely on the side effects -- they are unpredictable.
|
|
It is wrong to think that destructive words {}``modify'' the original
|
|
list -- rather, think of them as returning a new list, just like the
|
|
normal versions of the words, with the added caveat that the original
|
|
list must not be used again.
|
|
|
|
\texttt{nreverse ( list -{}- list )} reverses a list without consing.
|
|
In the following example, the return value has reused the cons cells of
|
|
the original list, and the original list has been destroyed:
|
|
|
|
\begin{alltt}
|
|
{[} 1 2 3 4 {]} dup nreverse .s
|
|
\emph{\{ {[} 1 {]} {[} 4 3 2 1 {]} \}}
|
|
\end{alltt}
|
|
Compare the second stack element (which is what remains of the original
|
|
list) and the top stack element (the list returned by \texttt{nreverse}).
|
|
|
|
The \texttt{nreverse} word is the most frequently used destructive
|
|
list manipulator. The usual idiom is a loop where values are consed
|
|
onto the beginning of a list in each iteration of a loop, then the
|
|
list is reversed at the end. Since the original list is never used
|
|
again, \texttt{nreverse} can safely be used here.
|
|
|
|
\texttt{nappend ( list list -{}- list )} sets the cdr of the last
|
|
cons cell in the first list to the second list, unless the first list
|
|
is \texttt{f}, in which case it simply returns the second list. Again,
|
|
the side effects on the first list are unpredictable -- if it is \texttt{f},
|
|
it is unchanged, otherwise, it is equal to the return value:
|
|
|
|
\begin{alltt}
|
|
{[} 1 2 {]} {[} 3 4 {]} nappend .
|
|
\emph{{[} 1 2 3 4 {]}}
|
|
\end{alltt}
|
|
Note in the above examples, we use literal list parameters to \texttt{nreverse}
|
|
and \texttt{nappend}. This is actually a very bad idea, since the same literal
|
|
list may be used more than once! For example, lets make a colon definition:
|
|
|
|
\begin{alltt}
|
|
: very-bad-idea {[} 1 2 3 4 {]} nreverse ;
|
|
very-bad-idea .
|
|
\emph{{[} 4 3 2 1 {]}}
|
|
very-bad-idea .
|
|
\emph{{[} 4 {]}}
|
|
{}``very-bad-idea'' see
|
|
\emph{: very-bad-idea}
|
|
\emph{ {[} 4 {]} nreverse ;}
|
|
\end{alltt}
|
|
As you can see, the word definition itself was ruined!
|
|
|
|
Sometimes it is desirable make a copy of a list, so that the copy
|
|
may be safely side-effected later.
|
|
|
|
\texttt{clone-list ( list -{}- list )} pushes a new list containing
|
|
the exact same elements as the original. The elements themselves are
|
|
not copied.
|
|
|
|
If you want to write your own destructive list manipulation words,
|
|
you can use \texttt{set-car ( value cons -{}- )} and \texttt{set-cdr
|
|
( value cons -{}- )} to modify individual cons cells. Some words that
|
|
are not destructive on their inputs nonetheless create intermediate
|
|
lists which are operated on using these words. One example is \texttt{clone-list}
|
|
itself.
|
|
|
|
|
|
\section{\label{sub:Vectors}Vectors}
|
|
|
|
A \emph{vector} is a contiguous chunk of memory cells which hold references to arbitrary
|
|
objects. Vectors have the following literal syntax:
|
|
|
|
\begin{alltt}
|
|
\{ f f f t t f t t -6 {}``Hey'' \}
|
|
\end{alltt}
|
|
Use of vector literals in source code is discouraged, since vector
|
|
manipulation relies on side effects rather than return values, and
|
|
hence it is very easy to mess up a literal embedded in a word definition.
|
|
|
|
Vector words are found in the \texttt{vectors} vocabulary.
|
|
|
|
\subsection{Vectors versus lists}
|
|
|
|
Vectors are applicable to a different class of problems than lists.
|
|
Compare the relative performance of common operations on vectors and
|
|
lists:
|
|
|
|
\begin{tabular}{|r|l|l|}
|
|
\hline
|
|
&
|
|
Lists&
|
|
Vectors\tabularnewline
|
|
\hline
|
|
\hline
|
|
Random access of an index&
|
|
linear time&
|
|
constant time\tabularnewline
|
|
\hline
|
|
Add new element at start&
|
|
constant time&
|
|
linear time\tabularnewline
|
|
\hline
|
|
Add new element at end&
|
|
linear time&
|
|
constant time\tabularnewline
|
|
\hline
|
|
\end{tabular}
|
|
|
|
When using vectors, you need to pass around a vector and an index
|
|
-- when working with lists, often only a list head is passed around.
|
|
For this reason, if you need a sequence for iteration only, a list
|
|
is a better choice because the list vocabulary contains a rich collection
|
|
of recursive words.
|
|
|
|
On the other hand, when you need to maintain your own {}``stack''-like
|
|
collection, a vector is the obvious choice, since most pushes and
|
|
pops can then avoid allocating memory.
|
|
|
|
Vectors and lists can be converted back and forth using the \texttt{vector>list}
|
|
word \texttt{( vector -{}- list )} and the \texttt{list>vector} word
|
|
\texttt{( list -{}- vector )}.
|
|
|
|
|
|
\subsection{Working with vectors}
|
|
|
|
\texttt{<vector> ( capacity -{}- vector )} pushes a zero-length vector.
|
|
Storing more elements than the initial capacity grows the vector.
|
|
|
|
\texttt{vector-nth ( index vector -{}- obj )} pushes the object stored
|
|
at a zero-based index of a vector:
|
|
|
|
\begin{alltt}
|
|
0 \{ "zero" "one" \} vector-nth .
|
|
\emph{"zero"}
|
|
2 \{ 1 2 \} vector-nth .
|
|
\emph{ERROR: Out of bounds}
|
|
\end{alltt}
|
|
\texttt{set-vector-nth ( obj index vector -{}- )} stores a value into
|
|
a vector:%
|
|
\footnote{The words \texttt{get} and \texttt{set} used in this example will
|
|
be formally introduced later.%
|
|
}
|
|
|
|
\begin{alltt}
|
|
\{ "math" "CS" \} "v" set
|
|
1 "philosophy" "v" get set-vector-nth
|
|
"v" get .
|
|
\emph{\{ "math" "philosophy" \}}
|
|
4 "CS" "v" get set-vector-nth
|
|
"v" get .
|
|
\emph{\{ "math" "philosophy" f f "CS" \}}
|
|
\end{alltt}
|
|
\texttt{vector-length ( vector -{}- length )} pushes the number of
|
|
elements in a vector. As the previous two examples demonstrate, attempting
|
|
to fetch beyond the end of the vector will raise an error, while storing
|
|
beyond the end will grow the vector as necessary.
|
|
|
|
\texttt{set-vector-length ( length vector -{}- )} resizes a vector.
|
|
If the new length is larger than the current length, the vector grows
|
|
if necessary, and the new cells are filled with \texttt{f}.
|
|
|
|
\texttt{vector-push ( obj vector -{}- )} adds an object at the end
|
|
of the vector. This increments the vector's length by one.
|
|
|
|
\texttt{vector-pop ( vector -{}- obj )} removes the object at the
|
|
end of the vector and pushes it. This decrements the vector's length
|
|
by one.
|
|
|
|
The \texttt{vector-push} and \texttt{vector-pop} words can be used to implement additional stacks. For example:
|
|
|
|
\begin{alltt}
|
|
20 <vector> "state-stack" set
|
|
: push-state ( obj -- ) "state-stack" get vector-push ;
|
|
: pop-state ( -- obj ) "state-stack" get vector-pop ;
|
|
12 push-state
|
|
4 push-state
|
|
pop-state .
|
|
\emph{4}
|
|
0 push-state
|
|
pop-state .
|
|
\emph{0}
|
|
pop-state .
|
|
\emph{12}
|
|
\end{alltt}
|
|
|
|
\subsection{Vector combinators}
|
|
|
|
A pair of combinators for iterating over vectors are provided in the \texttt{vectors} vocabulary. The first is the \texttt{vector-each} word that does nothing other than applying a quotation to each element. The second is the \texttt{vector-map} word that also collects the return values of the quotation into a new vector.
|
|
|
|
\texttt{vector-each ( vector quot -{}- )} pushes each element of the vector in turn, and executes the quotation. The quotation should have a stack effect of \texttt{( obj -- )}. The vector and the quotation are not on the stack when the quotation is executed. This allows the quotation to use values below the vector for accumilation and so on.
|
|
|
|
\texttt{vector-map ( vector quot -{}- str )} is similar to \texttt{vector-each}, except after each iteration the return value of the quotation is collected into a new vector. The quotation should have a stack effect of \texttt{( obj -- obj )}.
|
|
|
|
\section{Strings}
|
|
|
|
A \emph{string} is a sequence of 16-bit Unicode characters (conventionally,
|
|
in the UTF16 encoding). Strings are input by enclosing them in quotes:
|
|
|
|
\begin{alltt}
|
|
"GET /index.html HTTP/1.0"
|
|
\end{alltt}
|
|
String literals must not span more than one line. The following is
|
|
not valid:
|
|
|
|
\begin{alltt}
|
|
"Content-Type: text/html
|
|
Content-Length: 1280"
|
|
\end{alltt}
|
|
Instead, the newline must be represented using an escape, rather than
|
|
literally. The newline escape is \texttt{\textbackslash{}n}, so we
|
|
can write:
|
|
|
|
\begin{alltt}
|
|
"Content-Type: text/html\textbackslash{}nContent-Length: 1280"
|
|
\end{alltt}
|
|
Other special characters, such as quotes and tabs can be input in
|
|
a similar manner. Here is the full list of supported character escapes:
|
|
|
|
\begin{tabular}{|r|l|}
|
|
\hline
|
|
Character&
|
|
Escape\tabularnewline
|
|
\hline
|
|
\hline
|
|
Quote&
|
|
\texttt{\textbackslash{}''}\tabularnewline
|
|
\hline
|
|
Newline&
|
|
\texttt{\textbackslash{}n}\tabularnewline
|
|
\hline
|
|
Carriage return&
|
|
\texttt{\textbackslash{}r}\tabularnewline
|
|
\hline
|
|
Horizontal tab&
|
|
\texttt{\textbackslash{}t}\tabularnewline
|
|
\hline
|
|
Terminal escape&
|
|
\texttt{\textbackslash{}e}\tabularnewline
|
|
\hline
|
|
Zero chacater&
|
|
\texttt{\textbackslash{}0}\tabularnewline
|
|
\hline
|
|
Arbitrary Unicode character&
|
|
\texttt{\textbackslash{}u}\texttt{\emph{nnnn}}\tabularnewline
|
|
\hline
|
|
\end{tabular}
|
|
|
|
The last row shows a notation for inputting any possible character
|
|
using its hexadecimal value. For example, a space character can also
|
|
be input as \texttt{\textbackslash{}u0020}.
|
|
|
|
There is no specific character data type in Factor. When characters
|
|
are extracted from a string, they are pushed on the stack as integers.
|
|
It is possible to input an integer with a value equal to that of a
|
|
Unicode character using the following special notation:
|
|
|
|
\begin{alltt}
|
|
CHAR: A .
|
|
\emph{65}
|
|
CHAR: A 1 + CHAR: B = .
|
|
\emph{t}
|
|
\end{alltt}
|
|
|
|
\subsection{Working with strings}
|
|
|
|
String words are found in the \texttt{strings} vocabulary. String
|
|
manipulation words always return a new copy of a string rather than
|
|
modifying the string in-place. Notice the absence of words such as
|
|
\texttt{set-str-nth} and \texttt{set-str-length}. Unlike lists, for
|
|
which both constructive and destuctive manipulation words are provided,
|
|
destructive string operations are only done with a distinct string
|
|
buffer type which is the topic of the next section.
|
|
|
|
\texttt{str-length ( str -{}- n )} pushes the length of a string:
|
|
|
|
\begin{alltt}
|
|
{}``Factor'' str-length .
|
|
\emph{6}
|
|
\end{alltt}
|
|
\texttt{str-nth ( n str -{}- ch )} pushes the character located by
|
|
a zero-based index. A string is essentially a vector specialized for
|
|
storing one data type, the 16-bit unsigned character. These are returned
|
|
as integers, so printing will not yield the actual character:
|
|
\begin{alltt}
|
|
0 " " str-nth .
|
|
\emph{32}
|
|
\end{alltt}
|
|
\texttt{index-of ( str substr -{}- n )} searches a string for the
|
|
first occurrence of a substring or character. If an occurrence was
|
|
found, its index is pushed. Otherwise, -1 is pushed:
|
|
|
|
\begin{alltt}
|
|
{}``www.sun.com'' CHAR: . index-of .
|
|
\emph{3}
|
|
{}``mailto:billg@microsoft.com'' CHAR: / index-of .
|
|
\emph{-1}
|
|
{}``www.lispworks.com'' {}``.com'' index-of .
|
|
\emph{13}
|
|
\end{alltt}
|
|
\texttt{index-of{*} ( n str substr -{}- n )} works like \texttt{index-of},
|
|
except it takes a start index as an argument.
|
|
|
|
\texttt{substring ( start end str -{}- substr )} extracts a range
|
|
of characters from a string into a new string.
|
|
|
|
\texttt{split ( str split -{}- list )} pushes a new list of strings
|
|
which are substrings of the original string, taken in between occurrences
|
|
of the split string:
|
|
|
|
\begin{alltt}
|
|
"fixnum bignum ratio" " " split .
|
|
\emph{{[} "fixnum" "bignum" "ratio" {]}}
|
|
"/usr/bin/X" CHAR: / split .
|
|
\emph{{[} "" "usr" "bin" "X" {]}}
|
|
\end{alltt}
|
|
If you wish to concatenate a fixed number of strings at the top of
|
|
the stack, you can use a member of the \texttt{cat} family of words
|
|
from the \texttt{strings} vocabulary. They concatenate strings in
|
|
the order that they appear in the stack effect.
|
|
|
|
\begin{tabular}{|c|c|}
|
|
\hline
|
|
Word&
|
|
Stack effect\tabularnewline
|
|
\hline
|
|
\hline
|
|
\texttt{cat2}&
|
|
\texttt{( s1 s2 -{}- str )}\tabularnewline
|
|
\hline
|
|
\texttt{cat3}&
|
|
\texttt{( s1 s2 s3 -{}- str )}\tabularnewline
|
|
\hline
|
|
\texttt{cat4}&
|
|
\texttt{( s1 s2 s3 s4 -{}- str )}\tabularnewline
|
|
\hline
|
|
\texttt{cat5}&
|
|
\texttt{( s1 s2 s3 s4 s5 -{}- str )}\tabularnewline
|
|
\hline
|
|
\end{tabular}
|
|
|
|
\texttt{cat ( list -{}- str )} is a generalization of the above words;
|
|
it concatenates each element of a list into a new string.
|
|
|
|
Some straightfoward examples:
|
|
|
|
\begin{alltt}
|
|
"How are you, " "Chuck" "?" cat3 .
|
|
\emph{"How are you, Chuck?"}
|
|
"/usr/bin/X" CHAR: / split cat .
|
|
\emph{"usrbinX"}
|
|
\end{alltt}
|
|
String buffers, described in the next section, provide a more flexible
|
|
means of concatenating strings.
|
|
|
|
|
|
\subsection{String buffers}
|
|
|
|
A \emph{string buffer} is a mutable string. The canonical use for
|
|
a string buffer is to combine several strings into one. This is done
|
|
by creating a new string buffer, appending strings and characters,
|
|
and finally turning the string buffer into a string.
|
|
|
|
\texttt{<sbuf> ( capacity -{}- sbuf )} pushes a new string buffer
|
|
that is capable of holding up to the specified capacity before growing.
|
|
|
|
\texttt{sbuf-append ( str/ch sbuf -{}- )} appends a string or a character
|
|
to the end of the string buffer. If an integer is given, its least significant
|
|
16 bits are interpreted as a character value:
|
|
|
|
\begin{alltt}
|
|
100 <sbuf> "my-sbuf" set
|
|
"Testing" "my-sbuf" get sbuf-append
|
|
32 "my-sbuf" get sbuf-append
|
|
\end{alltt}
|
|
\texttt{sbuf>str ( sbuf -{}- str )} pushes a string with the same
|
|
contents as the string buffer:
|
|
|
|
\begin{alltt}
|
|
"my-sbuf" get sbuf>str .
|
|
"Testing "
|
|
\end{alltt}
|
|
While usually string buffers are only used to concatenate a series
|
|
of strings, they also support the same operations as vectors.
|
|
|
|
\texttt{sbuf-nth ( n sbuf -{}- ch )} pushes the character stored at
|
|
a zero-based index of a string buffer:
|
|
|
|
\begin{alltt}
|
|
2 "A string." str-nth .
|
|
\emph{115}
|
|
\end{alltt}
|
|
\texttt{set-sbuf-nth ( ch n sbuf -{}- )} sets the character stored
|
|
at a zero-based index of a string buffer. Only the least significant
|
|
16 bits of the charcter are stored into the string buffer.
|
|
|
|
\texttt{sbuf-length ( sbuf -{}- n )} pushes the number of characters
|
|
in a string buffer. This is not the same as the capacity of the string
|
|
buffer -- the capacity is the internal storage size of the string
|
|
buffer, the length is a possibly smaller number indicating how much
|
|
storage is in use.
|
|
|
|
\texttt{set-sbuf-length ( n sbuf -{}- )} changes the length of the
|
|
string buffer. The string buffer's storage grows if necessary, and
|
|
new character positions are automatically filled with zeroes.
|
|
|
|
|
|
\subsection{String constructors}
|
|
|
|
The string construction words provide an alternative way to build up a string. Instead of passing a string buffer around on the stack, they store the string buffer in a variable. This reduces the number
|
|
of stack elements that have to be juggled.
|
|
|
|
The word \texttt{<\% ( -{}- )} begins string construction. The word
|
|
definition creates a string buffer. Instead of leaving the string
|
|
buffer on the stack, the word creates and pushes a scope on the name
|
|
stack.
|
|
|
|
The word \texttt{\% ( str/ch -{}- )} appends a string or a character
|
|
to the partial list. The word definition calls \texttt{sbuf-append}
|
|
on a string buffer located by searching the name stack.
|
|
|
|
The word \texttt{\%> ( -{}- str )} pushes the complete list. The word
|
|
definition pops the name stack and calls \texttt{sbuf>str} on the
|
|
appropriate string buffer.
|
|
|
|
Compare the following two examples -- both define a word that concatenates together all elements of a list of strings. The first one uses a string buffer stored on the stack, the second uses string construction words:
|
|
|
|
\begin{alltt}
|
|
: cat ( list -- str )
|
|
100 <sbuf> swap [ over sbuf-append ] each sbuf>str ;
|
|
|
|
: cat ( list -- str )
|
|
<\% [ \% ] each \%> ;
|
|
\end{alltt}
|
|
|
|
\subsection{String combinators}
|
|
|
|
A pair of combinators for iterating over strings are provided in the \texttt{strings} vocabulary. The first is the \texttt{str-each} word that does nothing other than applying a quotation to each character. The second is the \texttt{str-map} word that also collects the return values of the quotation into a new string.
|
|
|
|
\texttt{str-each ( str quot -{}- )} pushes each character of the string in turn, and executes the quotation. The quotation should have a stack effect of \texttt{( ch -{}- )}. The string and the quotation are not on the stack when the quotation is executed. This allows the quotation to use values below the string for accumilation and so on. The following example counts the number of occurrences of the letter ``a'' in a string:
|
|
|
|
\begin{alltt}
|
|
: count-a ( str -- n )
|
|
0 swap [ CHAR: a = [ 1 + ] when ] str-each ;
|
|
|
|
"Lets just say that you may stay" count-a .
|
|
\emph{4}
|
|
\end{alltt}
|
|
|
|
\texttt{str-map ( str quot -{}- str )} is similar to \texttt{str-each}, except after each iteration the return value of the quotation is collected into a new string. The quotation should have a stack effect of \texttt{( ch -- str/ch )}. The following example replaces all occurrences of the space character in the string with \texttt{+}:
|
|
|
|
\begin{alltt}
|
|
"We do not like spaces" [ CHAR: \textbackslash{}s CHAR: + replace ] str-map .
|
|
\emph{"We+do+not+like+spaces"}
|
|
\end{alltt}
|
|
|
|
\subsection{Printing and reading strings}
|
|
|
|
The following two words from the \texttt{stdio} vocabulary output text to the terminal. They differ from \texttt{.}
|
|
in that they print strings only, without surrounding quotes, and raise
|
|
an error when given any other data type. The word \texttt{.} prints any Factor
|
|
object in a form suited for parsing, hence it quotes strings.
|
|
|
|
\texttt{write ( str -{}- )} writes a string to the standard output
|
|
device, without a terminating newline.
|
|
|
|
\texttt{print ( str -{}- )} writes a string followed by a newline
|
|
character. To print a single newline character, use \texttt{terpri (
|
|
-{}- )} instead of passing a blank string to \texttt{print}.
|
|
|
|
Input can be read from the terminal, a line at a time.
|
|
|
|
\texttt{read ( -{}- str )} reads a line of input from the standard
|
|
input device, terminated by a newline.
|
|
|
|
\begin{alltt}
|
|
"a" write "b" write
|
|
ab
|
|
{[} "hello" "world" {]} {[} print {]} each
|
|
hello
|
|
world
|
|
\end{alltt}
|
|
Often a string representation of a number, usually one read from an
|
|
input source, needs to be turned into a number. Unlike some languages,
|
|
in Factor the conversion from a string such as {}``123'' into the
|
|
number 123 is not automatic. To turn a string into a number, use one
|
|
of two words in the \texttt{parser} vocabulary.
|
|
|
|
\texttt{str>number ( str -{}- n )} creates an integer, ratio or floating
|
|
point literal from its string representation. If the string does not
|
|
reprent a valid number, an exception is thrown.
|
|
|
|
\texttt{parse-number ( str -{}- n/f )} pushes \texttt{f} on failure, rather
|
|
than raising an exception.
|
|
|
|
\texttt{unparse ( n -{}- str )} pushes the string representation of
|
|
a number.
|
|
|
|
|
|
\section{PRACTICAL: Contractor timesheet}
|
|
|
|
|
|
\subsection{Adding a timesheet entry}
|
|
|
|
When you begin working on a new task, you tell the timesheet you want
|
|
to add a new entry. It then measures the elapsed time until you specify
|
|
the task is done, and prompts for a task description.
|
|
|
|
The first word we will write is \texttt{measure-duration}. We measure
|
|
the time duration by using the \texttt{millis} word \texttt{( -{}-
|
|
m )} to take the time before and after a call to \texttt{read}. The
|
|
\texttt{millis} word pushes the number of milliseconds since a certain
|
|
epoch -- the epoch does not matter here since we are only interested
|
|
in the difference between two times.
|
|
|
|
A first attempt at \texttt{measure-duration} might look like this:
|
|
|
|
\begin{alltt}
|
|
: measure-duration millis read drop millis - ;
|
|
measure-duration .
|
|
\end{alltt}
|
|
This word definition has the right general idea, however, the result
|
|
is negative. Also, we would like to measure durations in minutes,
|
|
not milliseconds:
|
|
|
|
\begin{alltt}
|
|
: measure-duration ( -{}- duration )
|
|
millis
|
|
read drop
|
|
millis swap - 1000 /i 60 /i ;
|
|
\end{alltt}
|
|
Note that the \texttt{/i} word \texttt{( x y -{}- x/y )}, from the
|
|
\texttt{arithmetic} vocabulary, performs truncating division. This
|
|
makes sense, since we are not interested in fractional parts of a
|
|
minute here.
|
|
|
|
Now that we can measure a time duration at the keyboard, lets write
|
|
the \texttt{add-entry-prompt} word. This word does exactly what one
|
|
would expect -- it prompts for the time duration and description,
|
|
and leaves those two values on the stack:
|
|
|
|
\begin{alltt}
|
|
: add-entry-prompt ( -{}- duration description )
|
|
"Start work on the task now. Press ENTER when done." print
|
|
measure-duration
|
|
"Please enter a description:" print
|
|
read ;
|
|
\end{alltt}
|
|
You should interactively test this word. Measure off a minute or two,
|
|
press ENTER, enter a description, and press ENTER again. The stack
|
|
should now contain two values, in the same order as the stack effect
|
|
comment.
|
|
|
|
Now, almost all the ingredients are in place. The final add-entry
|
|
word calls add-entry-prompt, then pushes the new entry on the end
|
|
of the timesheet vector:
|
|
|
|
\begin{alltt}
|
|
: add-entry ( timesheet -{}- )
|
|
add-entry-prompt cons swap vector-push ;
|
|
\end{alltt}
|
|
Recall that timesheet entries are cons cells where the car is the
|
|
duration and the cdr is the description, hence the call to \texttt{cons}.
|
|
Note that this word side-effects the timesheet vector. You can test
|
|
it interactively like so:
|
|
|
|
\begin{alltt}
|
|
10 <vector> dup add-entry
|
|
\emph{Start work on the task now. Press ENTER when done.}
|
|
\emph{Please enter a description:}
|
|
\emph{Studying Factor}
|
|
.
|
|
\emph{\{ {[} 2 | "Studying Factor" {]} \}}
|
|
\end{alltt}
|
|
\subsection{Printing the timesheet}
|
|
|
|
The hard part of printing the timesheet is turning the duration in
|
|
minutes into a nice hours/minutes string, like {}``01:15''. We would
|
|
like to make a word like the following:
|
|
|
|
\begin{alltt}
|
|
135 hh:mm .
|
|
\emph{01:15}
|
|
\end{alltt}
|
|
First, we can make a pair of words hh and mm to extract the hours
|
|
and minutes, respectively. This can be achieved using truncating division,
|
|
and the modulo operator -- also, since we would like strings to be
|
|
returned, the \texttt{unparse} word \texttt{( obj -{}- str )} from
|
|
the \texttt{unparser} vocabulary is called to turn the integers into
|
|
strings:
|
|
|
|
\begin{alltt}
|
|
: hh ( duration -{}- str ) 60 /i unparse ;
|
|
: mm ( duration -{}- str ) 60 mod unparse ;
|
|
\end{alltt}
|
|
The \texttt{hh:mm} word can then be written, concatenating the return
|
|
values of \texttt{hh} and \texttt{mm} into a single string using string
|
|
construction:
|
|
|
|
\begin{alltt}
|
|
: hh:mm ( millis -{}- str ) <\% dup hh \% ":" \% mm \% \%> ;
|
|
\end{alltt}
|
|
However, so far, these three definitions do not produce ideal output.
|
|
Try a few examples:
|
|
|
|
\begin{alltt}
|
|
120 hh:mm .
|
|
2:0
|
|
130 hh:mm .
|
|
2:10
|
|
\end{alltt}
|
|
Obviously, we would like the minutes to always be two digits. Luckily,
|
|
there is a \texttt{digits} word \texttt{( str n -{}- str )} in the
|
|
\texttt{format} vocabulary that adds enough zeros on the left of the
|
|
string to give it the specified length. Try it out:
|
|
|
|
\begin{alltt}
|
|
{}``23'' 2 digits .
|
|
\emph{{}``23''}
|
|
{}``7'' 2 digits .
|
|
\emph{{}``07''}
|
|
\end{alltt}
|
|
We can now change the definition of \texttt{mm} accordingly:
|
|
|
|
\begin{alltt}
|
|
: mm ( duration -{}- str ) 60 mod unparse 2 digits ;
|
|
\end{alltt}
|
|
Now that time duration output is done, a first attempt at a definition
|
|
of \texttt{print-timesheet} looks like this:
|
|
|
|
\begin{alltt}
|
|
: print-timesheet ( timesheet -{}- )
|
|
{[} uncons write ": " write hh:mm print {]} vector-each ;
|
|
\end{alltt}
|
|
This works, but produces ugly output:
|
|
|
|
\begin{alltt}
|
|
\{ {[} 30 | "Studying Factor" {]} {[} 65 | "Paperwork" {]} \}
|
|
print-timesheet
|
|
\emph{Studying Factor: 0:30}
|
|
\emph{Paperwork: 1:05}
|
|
\end{alltt}
|
|
It would be much nicer if the time durations lined up in the same
|
|
column. First, lets factor out the body of the \texttt{vector-each}
|
|
loop into a new \texttt{print-entry} word before it gets too long:
|
|
|
|
\begin{alltt}
|
|
: print-entry ( duration description -{}- )
|
|
write {}``: '' write hh:mm print ;
|
|
|
|
: print-timesheet ( timesheet -{}- )
|
|
{[} uncons print-entry {]} vector-each ;
|
|
\end{alltt}
|
|
We can now make \texttt{print-entry} line up columns using the \texttt{pad-string}
|
|
word \texttt{( str n -{}- str )}.
|
|
|
|
\begin{alltt}
|
|
: print-entry ( duration description -{}- )
|
|
dup
|
|
write
|
|
50 swap pad-string write
|
|
hh:mm print ;
|
|
\end{alltt}
|
|
In the above definition, we first print the description, then enough
|
|
blanks to move the cursor to column 60. So the description text is
|
|
left-justified. If we had interchanged the order of the second and
|
|
third line in the definition, the description text would be right-justified.
|
|
|
|
Try out \texttt{print-timesheet} again, and marvel at the aligned
|
|
columns:
|
|
|
|
\begin{alltt}
|
|
\{ {[} 30 | "Studying Factor" {]} {[} 65 | "Paperwork" {]} \}
|
|
print-timesheet
|
|
\emph{Studying Factor 0:30}
|
|
\emph{Paperwork 1:05}
|
|
\end{alltt}
|
|
|
|
\subsection{The main menu}
|
|
|
|
Reading a number, showing a menu
|
|
|
|
|
|
\section{Variables and namespaces}
|
|
|
|
|
|
\subsection{Hashtables}
|
|
|
|
|
|
\subsection{Namespaces}
|
|
|
|
|
|
\subsection{The name stack}
|
|
|
|
So far, we have seen what we called ``the stack'' store intermediate values between computations. In fact Factor maintains a number of other stacks, and the formal name for the stack we've been dealing with so far is the \emph{data stack}.
|
|
|
|
Another stack is the \emph{call stack}. When a colon definition is invoked, the position within the current colon definition is pushed on the stack. This ensures that calling words return to the caller, just as in any other language with subroutines.\footnote{Factor supports a variety of structures for implementing non-local word exits, such as exceptions, co-routines, continuations, and so on. They all rely on manipulating the call stack and are described in later sections.}
|
|
|
|
The \emph{name stack} is the focus of this section. The \texttt{bind} combinator creates dynamic scope by pushing and popping namespaces on the name stack. Its definition is simpler than one would expect:
|
|
|
|
\begin{alltt}
|
|
: bind ( namespace quot -- )
|
|
swap >n call n> drop ;
|
|
\end{alltt}
|
|
|
|
The words \texttt{>n} and \texttt{n>} push and pop the name stack, respectively. Observe the stack flow in the definition of \texttt{bind}; the namespace goes on the name stack, the quotation is called, and the name space is popped and discarded.
|
|
|
|
The name stack is really just a vector. The words \texttt{>n} and \texttt{n>} are implemented as follows:
|
|
|
|
\begin{alltt}
|
|
: >n ( namespace -- n:namespace ) namestack* vector-push ;
|
|
: n> ( n:namespace -- namespace ) namestack* vector-pop ;
|
|
\end{alltt}
|
|
|
|
\subsection{The inspector}
|
|
|
|
|
|
\section{PRACTICAL: Music player}
|
|
|
|
|
|
\section{Metaprogramming}
|
|
|
|
Recall that code quotations are in fact just linked lists. Factor code is data, and vice versa. Essentially, the interpreter iterates through code quotations, pushing literals and executing words. When a word is executed, one of two things happen -- either the word has a colon definition, and the interpreter is invoked recursively on the definition, or the word is primitive, and it is executed by the underlying virtual machine. A word is itself a first-class object.
|
|
|
|
It is the job of the parser to transform source code denoting literals and words into their internal representations. This is done using a vocabulary of \emph{parsing words}. The prettyprinter does the converse, by printing out data structures in a parsable form (both to humans and Factor). Because code is data, text representation of source code doubles as a way to serialize almost any Factor object.
|
|
|
|
\subsection{Looking at words}
|
|
|
|
Try pushing a list of words on the stack, and take its first element:
|
|
|
|
\begin{alltt}
|
|
{[} * + {]} car .s
|
|
\emph{\{ * \}}
|
|
\end{alltt}
|
|
|
|
What happened here? Instead of being executed, a ``naked'', unquoted word was pushed on the stack. The predicate \texttt{word? ( obj -{}- ? )} from the \texttt{words} vocabulary tests if the top of the stack is a word. Another way to get a word on the stack is to do a vocabulary search using a word name and a list of vocabularies to search in:
|
|
|
|
\begin{alltt}
|
|
"car" {[} "lists" {]} search .s
|
|
\emph{\{ car \}}
|
|
\end{alltt}
|
|
|
|
The \texttt{search} word will push \texttt{f} if the word is not defined. A new word can be created in a specified vocabulary explicitly:
|
|
|
|
\begin{alltt}
|
|
"start-server" "user" create .s
|
|
\emph{\{ start-server \}}
|
|
\end{alltt}
|
|
|
|
Two words are only ever equal under the \texttt{=} operator if they identify the same underlying object. Word objects are composed of three slots, named as follows.
|
|
|
|
\begin{tabular}{|r|l|}
|
|
\hline
|
|
Slot&
|
|
Description\tabularnewline
|
|
\hline
|
|
\hline
|
|
Primitive&
|
|
A number identifying a virtual machine operation.\tabularnewline
|
|
\hline
|
|
Parameter&
|
|
An object parameter for the virtual machine operation.\tabularnewline
|
|
\hline
|
|
Property list&
|
|
An association list of name/value pairs.\tabularnewline
|
|
\hline
|
|
\end{tabular}
|
|
|
|
If the primitive number is set to 1, the word is a colon definition and the parameter must be a quotation. Any other primitive number denotes a function of the virtual machine, and the parameter is ignored. Do not rely on primitive numbers in your code, instead use the \texttt{compound? ( obj -{}- ? )} and \texttt{primitive? ( obj -{}- ? )} predicates.
|
|
|
|
The word \texttt{define ( word quot -{}- )} defines a word to have the specified colon definition. Note that \texttt{create} and \texttt{define} perform an action somewhat analagous to the \texttt{: ... ;} notation for colon definitions, except at parse time rather than run time.
|
|
|
|
\subsection{The prettyprinter}
|
|
|
|
We've already seen the word \texttt{.} which prints the top of the stack in a form that may be read back in. The word \texttt{prettyprint} is similar, except the output is in an indented, multiple-line format. Both words are in the \texttt{prettyprint} vocabulary. Here is an example:
|
|
|
|
\begin{alltt}
|
|
{[} 1 {[} 2 3 4 {]} 5 {]} .
|
|
\emph{{[} 1 {[} 2 3 4 {]} 5 {]}}
|
|
{[} 1 {[} 2 3 4 {]} 5 {]} prettyprint
|
|
\emph{{[}
|
|
1 {[}
|
|
2 3 4
|
|
{]} 5
|
|
{]}}
|
|
\end{alltt}
|
|
|
|
|
|
\subsection{The parser}
|
|
|
|
\subsection{Parsing words}
|
|
|
|
Lets take a closer look at Factor syntax. Consider a simple expression,
|
|
and the result of evaluating it in the interactive interpreter:
|
|
|
|
\begin{alltt}
|
|
2 3 + .
|
|
\emph{5}
|
|
\end{alltt}
|
|
The interactive interpreter is basically an infinite loop. It reads
|
|
a line of input from the terminal, parses this line to produce a \emph{quotation},
|
|
and executes the quotation.
|
|
|
|
In the parse step, the input text is tokenized into a sequence of
|
|
white space-separated tokens. First, the interpreter checks if there
|
|
is an existing word named by the token. If there is no such word,
|
|
the interpreter instead treats the token as a number.%
|
|
\footnote{Of course, Factor supports a full range of data types, including strings,
|
|
lists and vectors. Their source representations are still built from
|
|
numbers and words, however.%
|
|
}
|
|
|
|
Once the expression has been entirely parsed, the interactive interpreter
|
|
executes it.
|
|
|
|
This parse time/run time distinction is important, because words fall
|
|
into two categories; {}``parsing words'' and {}``running words''.
|
|
|
|
The parser constructs a parse tree from the input text. When the parser
|
|
encounters a token representing a number or an ordinary word, the
|
|
token is simply appended to the current parse tree node. A parsing
|
|
word on the other hand is executed \emph{}immediately after being
|
|
tokenized. Since it executes in the context of the parser, it has
|
|
access to the raw input text, the entire parse tree, and other parser
|
|
structures.
|
|
|
|
Parsing words are also defined using colon definitions, except we
|
|
add \texttt{parsing} after the terminating \texttt{;}. Here are two
|
|
examples of definitions for words \texttt{foo} and \texttt{bar}, both
|
|
are identical except in the second example, \texttt{foo} is defined
|
|
as a parsing word:
|
|
|
|
\begin{alltt}
|
|
! Lets define 'foo' as a running word.
|
|
: foo "1) foo executed." print ;
|
|
: bar foo "2) bar executed." print ;
|
|
bar
|
|
\emph{1) foo executed}
|
|
\emph{2) bar executed}
|
|
bar
|
|
\emph{1) foo executed}
|
|
\emph{2) bar executed}
|
|
|
|
! Now lets define 'foo' as a parsing word.
|
|
: foo "1) foo executed." print ; parsing
|
|
: bar foo "2) bar executed." ;
|
|
\emph{1) foo executed}
|
|
bar
|
|
\emph{2) bar executed}
|
|
bar
|
|
\emph{2) bar executed}
|
|
\end{alltt}
|
|
In fact, the word \texttt{{}''} that denotes a string literal is
|
|
a parsing word -- it reads characters from the input text until the
|
|
next occurrence of \texttt{{}''}, and appends this string to the
|
|
current node of the parse tree. Note that strings and words are different
|
|
types of objects. Strings are covered in great detail later.
|
|
|
|
|
|
\section{PRACTICAL: Infix syntax}
|
|
|
|
|
|
\section{Continuations}
|
|
|
|
Call stack how it works and >r/r>
|
|
|
|
Generators, co-routines, multitasking, exception handling
|
|
|
|
|
|
\section{HTTP Server}
|
|
|
|
|
|
\section{PRACTICAL: Some web app}
|
|
\end{document}
|