2093 lines
63 KiB
TeX
2093 lines
63 KiB
TeX
|
%% LyX 1.3 created this file. For more info, see http://www.lyx.org/.
|
||
|
%% Do not edit unless you really know what you are doing.
|
||
|
\documentclass[english]{article}
|
||
|
\usepackage[T1]{fontenc}
|
||
|
\usepackage[latin1]{inputenc}
|
||
|
\usepackage{alltt}
|
||
|
\pagestyle{headings}
|
||
|
\setcounter{tocdepth}{2}
|
||
|
\setlength\parskip{\medskipamount}
|
||
|
\setlength\parindent{0pt}
|
||
|
|
||
|
\makeatletter
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
|
||
|
%% Because html converters don't know tabularnewline
|
||
|
\providecommand{\tabularnewline}{\\}
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.
|
||
|
\newenvironment{lyxcode}
|
||
|
{\begin{list}{}{
|
||
|
\setlength{\rightmargin}{\leftmargin}
|
||
|
\setlength{\listparindent}{0pt}% needed for AMS classes
|
||
|
\raggedright
|
||
|
\setlength{\itemsep}{0pt}
|
||
|
\setlength{\parsep}{0pt}
|
||
|
\normalfont\ttfamily}%
|
||
|
\item[]}
|
||
|
{\end{list}}
|
||
|
|
||
|
\usepackage{babel}
|
||
|
\makeatother
|
||
|
\begin{document}
|
||
|
|
||
|
\title{Factor Developer's Guide}
|
||
|
|
||
|
|
||
|
\author{Slava Pestov}
|
||
|
|
||
|
\maketitle
|
||
|
\tableofcontents{}
|
||
|
|
||
|
|
||
|
\newpage
|
||
|
\section*{Introduction}
|
||
|
|
||
|
Factor is an imperative programming language with functional and object-oriented
|
||
|
influences. Its primary goal is to be used for web-based server-side
|
||
|
applications. Factor is interpreted by a virtual machine that provides
|
||
|
garbage collection and prohibits pointer arithmetic.%
|
||
|
\footnote{Two releases of Factor are available -- a virtual machine written
|
||
|
in C, and an interpreter written in Java that runs on the Java virtual
|
||
|
machine. This guide targets the C version of Factor.%
|
||
|
}
|
||
|
|
||
|
Factor borrows heavily from Forth, Joy and Lisp. From Forth it inherits
|
||
|
a flexible syntax defined in terms of {}``parsing words'' and an
|
||
|
execution model based on a data stack and call stack. From Joy and
|
||
|
Lisp it inherits a virtual machine prohibiting direct pointer arithmetic,
|
||
|
and the use of {}``cons cells'' to represent code and data structure.
|
||
|
|
||
|
|
||
|
\section{Fundamentals}
|
||
|
|
||
|
A \char`\"{}word\char`\"{} is the main unit of program organization
|
||
|
in Factor -- it corresponds to a \char`\"{}function\char`\"{}, \char`\"{}procedure\char`\"{}
|
||
|
or \char`\"{}method\char`\"{} in other languages.
|
||
|
|
||
|
When code examples are given, the input is in a roman font, and any
|
||
|
output from the interpreter is in italics:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``Hello,~world!''~print
|
||
|
|
||
|
\emph{Hello,~world!}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{The stack}
|
||
|
|
||
|
The stack is used to exchange data between words. When a number is
|
||
|
executed, it is pushed on the stack. When a word is executed, it receives
|
||
|
input parameters by removing successive elements from the top of the
|
||
|
stack. Results are then pushed back to the top of the stack.
|
||
|
|
||
|
The word \texttt{.s} prints the contents of the stack, leaving the
|
||
|
contents of the stack unaffected. The top of the stack is the rightmost
|
||
|
element in the printout:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
2~3~.s
|
||
|
|
||
|
\emph{\{~2~3~\}}
|
||
|
\end{lyxcode}
|
||
|
The word \texttt{.} removes the object at the top of the stack, and
|
||
|
prints it:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
1~2~3~.~.~.
|
||
|
|
||
|
\emph{3}
|
||
|
|
||
|
\emph{2}
|
||
|
|
||
|
\emph{1}
|
||
|
\end{lyxcode}
|
||
|
The usual arithmetic operators \texttt{+ - {*} /} all take two parameters
|
||
|
from the stack, and push one result back. Where the order of operands
|
||
|
matters (\texttt{-} and \texttt{/}), the operands are taken from the
|
||
|
stack in the natural order. For example:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
10~17~+~.
|
||
|
|
||
|
\emph{27}
|
||
|
|
||
|
111~234~-~.
|
||
|
|
||
|
\emph{-123}
|
||
|
|
||
|
333~3~/~.
|
||
|
|
||
|
\emph{111}
|
||
|
\end{lyxcode}
|
||
|
This type of arithmetic is called \emph{postfix}, because the operator
|
||
|
follows the operands. Contrast this with \emph{infix} notation used
|
||
|
in many other languages, so-called because the operator is in-between
|
||
|
the two operands.
|
||
|
|
||
|
More complicated infix expressions can be translated into postfix
|
||
|
by translating the inner-most parts first. Grouping parentheses are
|
||
|
never necessary:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
!~Postfix~equivalent~of~(2~+~3)~{*}~6
|
||
|
|
||
|
2~3~+~6~{*}
|
||
|
|
||
|
\emph{30}
|
||
|
|
||
|
!~Postfix~equivalent~of~2~+~(3~{*}~6)
|
||
|
|
||
|
2~3~6~{*}~+
|
||
|
|
||
|
\emph{20}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Factoring}
|
||
|
|
||
|
New words can be defined in terms of existing words using the \emph{colon
|
||
|
definition} syntax:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~\emph{name}~(~\emph{inputs}~-{}-~\emph{outputs}~)
|
||
|
|
||
|
~~~~\#!~\emph{Description}
|
||
|
|
||
|
~~~~\emph{factors~...}~;
|
||
|
\end{lyxcode}
|
||
|
When the new word is executed, each one of its factors gets executed,
|
||
|
in turn. The comment delimited by \texttt{(} and \texttt{)} is called
|
||
|
a stack effect comment and is described later. The stack effect comment,
|
||
|
as well as the documentation comment starting with \texttt{\#!} are
|
||
|
both optional, and can be placed anywhere in the source code, not
|
||
|
just in colon definitions.
|
||
|
|
||
|
Note that in a source file, a word definition can span multiple lines.
|
||
|
However, the interactive interpreter expects each line of input to
|
||
|
be {}``complete'', so interactively, colon definitions must be entered
|
||
|
all on one line.
|
||
|
|
||
|
For example, lets assume we are designing some software for an aircraft
|
||
|
navigation system. Lets assume that internally, all lengths are stored
|
||
|
in meters, and all times are stored in seconds. We can define words
|
||
|
for converting from kilometers to meters, and hours and minutes to
|
||
|
seconds:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~kilometers~1000~{*}~;
|
||
|
|
||
|
:~minutes~60~{*}~;
|
||
|
|
||
|
:~hours~60~{*}~60~{*}~;
|
||
|
|
||
|
2~kilometers~.
|
||
|
|
||
|
\emph{2000}
|
||
|
|
||
|
10~minutes~.
|
||
|
|
||
|
\emph{600}
|
||
|
|
||
|
2~hours~.
|
||
|
|
||
|
\emph{7200}
|
||
|
\end{lyxcode}
|
||
|
Now, suppose we need a word that takes the flight time, the aircraft
|
||
|
velocity, and the tailwind velocity, and returns the distance travelled.
|
||
|
If the parameters are given on the stack in that order, all we do
|
||
|
is add the top two elements (aircraft velocity, tailwind velocity)
|
||
|
and multiply it by the element underneath (flight time). So the definition
|
||
|
looks like this, this time with a stack effect comment since its slightly
|
||
|
less obvious what the operands are:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~distance~(~time~aircraft~tailwind~-{}-~distance~)~+~{*}~;
|
||
|
|
||
|
2~900~36~distance~.
|
||
|
|
||
|
\emph{1872}
|
||
|
\end{lyxcode}
|
||
|
Note that we are not using any units here. We could, if we defined
|
||
|
some words for velocity units first. The only non-trivial thing here
|
||
|
is the implementation of \texttt{km/hour} -- we have to divide the
|
||
|
\texttt{km/sec} velocity by the number of seconds in one hour to get
|
||
|
the desired result:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~km/hour~kilometers~1~hours~/~;
|
||
|
|
||
|
2~hours~900~km/hour~36~km/hour~distance~.
|
||
|
|
||
|
\emph{1872000}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Stack effects}
|
||
|
|
||
|
A stack effect comment contains a description of inputs to the left
|
||
|
of \texttt{-{}-}, and a description of outputs to the right. As always,
|
||
|
the top of the stack is on the right side. Lets try writing a word
|
||
|
to compute the cube of a number.%
|
||
|
\footnote{I'd use the somewhat simpler example of a word that squares a number,
|
||
|
but such a word already exists in the standard library. Its in the
|
||
|
\texttt{arithmetic} vocabulary, named \texttt{sq}.%
|
||
|
}
|
||
|
|
||
|
Three numbers on the stack can be multiplied together using \texttt{{*}
|
||
|
{*}}:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
2~4~8~{*}~{*}~.
|
||
|
|
||
|
\emph{64}
|
||
|
\end{lyxcode}
|
||
|
However, the stack effect of \texttt{{*} {*}} is \texttt{( a b c -{}-
|
||
|
a{*}b{*}c )}. We would like to write word that takes \emph{one} input
|
||
|
only. To achieve this, we need to be able to duplicate the top stack
|
||
|
element twice. As it happens, there is a word \texttt{dup ( x -{}-
|
||
|
x x )} for precisely this purpose. Now, we are able to define the
|
||
|
\texttt{cube} word:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~cube~dup~dup~{*}~{*}~;
|
||
|
|
||
|
10~cube~.
|
||
|
|
||
|
\emph{1000}
|
||
|
|
||
|
-2~cube~.
|
||
|
|
||
|
\emph{-8}
|
||
|
\end{lyxcode}
|
||
|
It is quite often the case that we want to compose two factors in
|
||
|
a colon definition, but their stack effects don't {}``match up''.
|
||
|
|
||
|
There is a set of \emph{shuffle words} for solving precisely this
|
||
|
problem. These words are so-called because they simply rearrange stack
|
||
|
elements in some fashion, without modifying them in any way. Lets
|
||
|
take a look at the most frequently-used shuffle words:
|
||
|
|
||
|
\texttt{drop ( x -{}- )} Discard the top stack element. Used when
|
||
|
a return value is not needed.
|
||
|
|
||
|
\texttt{dup ( x -{}- x x )} Duplicate the top stack element. Used
|
||
|
when a value is needed more than once.
|
||
|
|
||
|
\texttt{swap ( x y -{}- y x )} Swap top two stack elements. Used when
|
||
|
a word expects parameters in a different order.
|
||
|
|
||
|
\texttt{rot ( x y z -{}- y z x )} Rotate top three stack elements
|
||
|
to the left.
|
||
|
|
||
|
\texttt{-rot ( x y z -{}- z x y )} Rotate top three stack elements
|
||
|
to the right.
|
||
|
|
||
|
\texttt{over ( x y -{}- x y x )} Bring the second stack element {}``over''
|
||
|
the top element.
|
||
|
|
||
|
\texttt{nip ( x y -{}- y )} Remove the second stack element.
|
||
|
|
||
|
\texttt{tuck ( x y -{}- y x y )} Tuck the top stack element under
|
||
|
the second stack element.
|
||
|
|
||
|
You can try all these words out -- push some numbers on the stack,
|
||
|
execute a word, and look at how the stack contents was changed using
|
||
|
\texttt{.s}. Compare the stack contents with the stack effects above.
|
||
|
|
||
|
Note the order of the shuffle word descriptions above. The ones at
|
||
|
the top are used most often because they are easy to understand. The
|
||
|
more complex ones such as rot should be avoided as possible, because
|
||
|
they make the flow of data in a word definition harder to understand.
|
||
|
|
||
|
If you find yourself using too many shuffle words, or you're writing
|
||
|
a stack effect comment in the middle of a colon definition, it is
|
||
|
a good sign that the word should probably be factored into two or
|
||
|
more words. Effective factoring is like riding a bicycle -- it is
|
||
|
hard at first, but then you {}``get it'', and writing small, clear
|
||
|
and reusable word definitions becomes second-nature.
|
||
|
|
||
|
|
||
|
\subsection{Combinators}
|
||
|
|
||
|
A quotation a list of objects that can be executed. Words that operate
|
||
|
on quotations are called \emph{combinators}. Quotations are input
|
||
|
using the following syntax:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~2~3~+~.~{]}
|
||
|
\end{lyxcode}
|
||
|
When input, a quotation is not executed immediately -- rather, it
|
||
|
becomes one object on the stack. Try evaluating the following:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~1~2~3~+~{*}~{]}~.s
|
||
|
|
||
|
\emph{\{~{[}~1~2~3~+~{*}~{]}~\}}
|
||
|
|
||
|
call~.s
|
||
|
|
||
|
\emph{\{~5~\}}
|
||
|
\end{lyxcode}
|
||
|
\texttt{call} \texttt{( quot -{}- )} executes the quotation at the
|
||
|
top of the stack. Using \texttt{call} with a literal quotation is
|
||
|
useless; writing out the elements of the quotation has the same effect.
|
||
|
However, the \texttt{call} combinator is a building block of more
|
||
|
powerful combinators, since quotations can be passed around arbitrarily
|
||
|
and even modified before being called.
|
||
|
|
||
|
\texttt{ifte} \texttt{( cond true false -{}- )} executes either the
|
||
|
\texttt{true} or \texttt{false} quotations, depending on the boolean
|
||
|
value of \texttt{cond}. In Factor, there is no real boolean data type
|
||
|
-- instead, a special object \texttt{f} is the only object with a
|
||
|
{}``false'' boolean value. Every other object is a boolean {}``true''.
|
||
|
The special object \texttt{t} is the {}``canonical'' truth value.
|
||
|
|
||
|
Here is an example of \texttt{ifte} usage:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
1~2~<~{[}~{}``1~is~less~than~2.''~print~{]}~{[}~{}``bug!''~print~{]}~ifte
|
||
|
\end{lyxcode}
|
||
|
Compare the order of operands here, and the order of arguments in
|
||
|
the stack effect of \texttt{ifte}.
|
||
|
|
||
|
That the stack effects of the two \texttt{ifte} branches should be
|
||
|
the same. If they differ, the word becomes harder to document and
|
||
|
debug.
|
||
|
|
||
|
\texttt{times ( num quot -{}- )} executes a quotation a number of
|
||
|
times. It is good style to have the quotation always consume as many
|
||
|
values from the stack as it produces. This ensures the stack effect
|
||
|
of the entire \texttt{times} expression stays constant regardless
|
||
|
of the number of iterations.
|
||
|
|
||
|
More combinators will be introduced later.
|
||
|
|
||
|
|
||
|
\subsection{Vocabularies}
|
||
|
|
||
|
The dictionary of words is not a flat list -- rather, it is separated
|
||
|
into a number of \emph{vocabularies}. Each vocabulary is a named list
|
||
|
of words that have something in common -- for example, the {}``lists''
|
||
|
vocabulary contains words for working with linked lists.
|
||
|
|
||
|
When a word is read by the parser, the \emph{vocabulary search path}
|
||
|
determines which vocabularies to search. In the interactive interpreter,
|
||
|
the default search path contains a large number of vocabularies. Contrast
|
||
|
this to the situation when a file is being parsed -- the search path
|
||
|
has a minimal set of vocabularies containing basic parsing words.%
|
||
|
\footnote{The rationale here is that the interactive interpreter should have
|
||
|
a large number of words available by default, for convenience, whereas
|
||
|
source files should specify their external dependencies explicitly.%
|
||
|
}
|
||
|
|
||
|
New vocabularies are added to the search path using the \texttt{USE:}
|
||
|
parsing word. For example:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``/home/slava/.factor-rc''~exists?~.
|
||
|
|
||
|
\emph{ERROR:~<interactive>:1:~Undefined:~exists?}
|
||
|
|
||
|
USE:~streams
|
||
|
|
||
|
{}``/home/slava/.factor-rc''~exists?~.
|
||
|
|
||
|
\emph{t}
|
||
|
\end{lyxcode}
|
||
|
How do you know which vocabulary contains a word? Vocabularies can
|
||
|
either be listed, or an {}``apropos'' search can be performed:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
\char`\"{}init\char`\"{}~words.
|
||
|
|
||
|
\emph{{[}~?run-file~boot~cli-arg~cli-param~init-environment}
|
||
|
|
||
|
\emph{init-gc~init-interpreter~init-scratchpad~init-search-path}
|
||
|
|
||
|
\emph{init-stdio~init-toplevel~parse-command-line~parse-switches}
|
||
|
|
||
|
\emph{run-files~run-user-init~stdin~stdout~{]}~}
|
||
|
|
||
|
|
||
|
|
||
|
\char`\"{}map\char`\"{}~apropos.
|
||
|
|
||
|
\emph{IN:~lists}
|
||
|
|
||
|
\emph{map}
|
||
|
|
||
|
\emph{IN:~strings}
|
||
|
|
||
|
\emph{str-map}
|
||
|
|
||
|
\emph{IN:~vectors}
|
||
|
|
||
|
\emph{(vector-map)}
|
||
|
|
||
|
\emph{(vector-map-step)}
|
||
|
|
||
|
\emph{vector-map~}
|
||
|
\end{lyxcode}
|
||
|
New words are defined in the \emph{input vocabulary}. The input vocabulary
|
||
|
can be changed at the interactive prompt, or in a source file, using
|
||
|
the \texttt{IN:} parsing word. For example:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
IN:~music-database
|
||
|
|
||
|
:~random-playlist~...~;
|
||
|
\end{lyxcode}
|
||
|
It is a convention (although it is not enforced by the parser) that
|
||
|
the \texttt{IN:} directive is the first statement in a source file,
|
||
|
and all \texttt{USE:} follow, before any other definitions.
|
||
|
|
||
|
|
||
|
\section{PRACTICAL: Numbers game}
|
||
|
|
||
|
In this section, basic input/output and flow control is introduced.
|
||
|
We construct a program that repeatedly prompts the user to guess a
|
||
|
number -- they are informed if their guess is correct, too low, or
|
||
|
too high. The game ends on a correct guess.
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
numbers-game
|
||
|
|
||
|
\emph{I'm~thinking~of~a~number~between~0~and~100.}
|
||
|
|
||
|
\emph{Enter~your~guess:}~25
|
||
|
|
||
|
\emph{Too~low}
|
||
|
|
||
|
\emph{Enter~your~guess:}~38
|
||
|
|
||
|
\emph{Too~high}
|
||
|
|
||
|
\emph{Enter~your~guess:}~31
|
||
|
|
||
|
\emph{Correct~-~you~win!}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Development methodology}
|
||
|
|
||
|
A typical Factor development session involves a text editor and Factor
|
||
|
interpreter running side by side. Instead of the edit/compile/run
|
||
|
cycle, the development process becomes an {}``edit cycle'' -- you
|
||
|
make some changes to the source file and reload it in the interpreter
|
||
|
using a command like this:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
~~{}``numbers-game.factor''~run-file
|
||
|
\end{lyxcode}
|
||
|
Then the changes can be tested, either by hand, or using a test harness.
|
||
|
There is no need to compile anything, or to lose interpreter state
|
||
|
by restarting. Additionally, words with {}``throw-away'' definitions
|
||
|
that you do not intend to keep can also be entered directly at this
|
||
|
interpreter prompt.
|
||
|
|
||
|
Each word should do one useful task. New words can be defined in terms
|
||
|
of existing, already-tested words. You design a set of reusable words
|
||
|
that model the problem domain. Then, the problem is solved in terms
|
||
|
of a \emph{domain-specific vocabulary}. This is called \emph{bottom-up
|
||
|
design.}
|
||
|
|
||
|
The jEdit text editor makes Factor development much more pleasant.
|
||
|
The Factor plugin for jEdit provides an {}``integrated development
|
||
|
environment'' with many time-saving features. See the documentation
|
||
|
for the plugin itself for details.
|
||
|
|
||
|
|
||
|
\subsection{Getting started}
|
||
|
|
||
|
Start a text editor and create a file named \texttt{numbers-game.factor}.
|
||
|
|
||
|
At the top of the file, write a comment. Comments are a feature that
|
||
|
can be found in almost any programming language; in Factor, they are
|
||
|
implemented as parsing words. An example of commenting follows:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
!~The~word~!~discards~input~until~the~end~of~the~line
|
||
|
|
||
|
(~The~word~(~discards~input~until~the~next~)
|
||
|
\end{lyxcode}
|
||
|
It is always a good idea to comment your code. Try to write simple
|
||
|
code that does not need detailed comments to describe; similarly,
|
||
|
avoid redundant comments. These two principles are hard to quantify
|
||
|
in a concrete way, and will become more clear as your skills with
|
||
|
Factor increase.
|
||
|
|
||
|
We will be defining new words in the numbers-game vocabulary; add
|
||
|
an \texttt{IN:} statement at the top of the source file:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
IN:~numbers-game
|
||
|
\end{lyxcode}
|
||
|
Also in order to be able to test the words, issue a \texttt{USE:}
|
||
|
statement in the interactive interpreter:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
USE:~numbers-game
|
||
|
\end{lyxcode}
|
||
|
This section will develop the numbers game in an incremental fashion.
|
||
|
After each addition, issue a command like the following to load the
|
||
|
source file into the Factor interpreter:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``numbers-game.factor''~run-file
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Reading a number from the keyboard}
|
||
|
|
||
|
A fundamental operation required for the numbers game is to be able
|
||
|
to read a number from the keyboard. The \texttt{read} word \texttt{(
|
||
|
-{}- str )} reads a line of input and pushes it on the stack as a
|
||
|
string. The \texttt{parse-word} word \texttt{( str -{}- n )} turns a decimal
|
||
|
string representation of an integer into the integer itself. These
|
||
|
two words can be combined into a single colon definition:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~read-number~(~-{}-~n~)~read~parse-word~;
|
||
|
\end{lyxcode}
|
||
|
You should add this definition to the source file, and try loading
|
||
|
the file into the interpreter. As you will soon see, this raises an
|
||
|
error! The problem is that the two words \texttt{read} and \texttt{parse-word}
|
||
|
are not part of the default, minimal, vocabulary search path used
|
||
|
when reading files. The solution is to use \texttt{apropos.} to find
|
||
|
out which vocabularies contain those words, and add the appropriate
|
||
|
USE: statements to the source file:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
USE:~parser
|
||
|
|
||
|
USE:~stdio
|
||
|
\end{lyxcode}
|
||
|
After adding the above two statements, the file should now parse,
|
||
|
and testing should confirm that the read-number word works correctly.%
|
||
|
\footnote{There is the possibility of an invalid number being entered at the
|
||
|
keyboard. In this case, \texttt{print-number} returns \texttt{f},
|
||
|
the boolean false value. For the sake of simplicity, we ignore this
|
||
|
case in the numbers game example. However, proper error handling is
|
||
|
an essential part of any large program and is covered later.%
|
||
|
}
|
||
|
|
||
|
|
||
|
\subsection{Printing some messages}
|
||
|
|
||
|
Now we need to make some words for printing various messages. They
|
||
|
are given here without further ado:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~guess-banner
|
||
|
|
||
|
~~~~{}``I'm~thinking~of~a~number~between~0~and~100.''~print~;
|
||
|
|
||
|
:~guess-prompt~{}``Enter~your~guess:~''~write~;
|
||
|
|
||
|
:~too-high~{}``Too~high''~print~;
|
||
|
|
||
|
:~too-low~{}``Too~low''~print~;
|
||
|
|
||
|
:~correct~{}``Correct~-~you~win!''~print~;
|
||
|
\end{lyxcode}
|
||
|
Note that in the above, stack effect comments are omitted, since they
|
||
|
are obvious from context. You should ensure the words work correctly
|
||
|
after loading the source file into the interpreter.
|
||
|
|
||
|
|
||
|
\subsection{Taking action based on a guess}
|
||
|
|
||
|
The next logical step is to write a word \texttt{judge-guess} that
|
||
|
takes the user's guess along with the actual number to be guessed,
|
||
|
and prints one of the messages \texttt{too-high}, \texttt{too-low},
|
||
|
or \texttt{correct}. This word will also push a boolean flag, indicating
|
||
|
if the game should continue or not -- in the case of a correct guess,
|
||
|
the game does not continue.
|
||
|
|
||
|
This description of judge-guess is a mouthful -- and it suggests that
|
||
|
it may be best to split it into two words. So the first word we write
|
||
|
handles the more specific case of an \emph{inexact} guess -- so it
|
||
|
prints either \texttt{too-low} or \texttt{too-high}.
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~inexact-guess~(~guess~actual~-{}-~)
|
||
|
|
||
|
~~~~~>~{[}~too-high~{]}~{[}~too-low~{]}~ifte~;
|
||
|
\end{lyxcode}
|
||
|
Note that the word gives incorrect output if the two parameters are
|
||
|
equal. However, it will never be called this way.
|
||
|
|
||
|
With this out of the way, the implementation of judge-guess is an
|
||
|
easy task to tackle. Using the words \texttt{inexact-guess}, \texttt{=},
|
||
|
and \texttt{2dup}, we can write:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~judge-guess~(~actual~guess~-{}-~?~)
|
||
|
|
||
|
~~~~2dup~=~{[}
|
||
|
|
||
|
~~~~~~~~correct~f
|
||
|
|
||
|
~~~~{]}~{[}
|
||
|
|
||
|
~~~~~~~~inexact-guess~t
|
||
|
|
||
|
~~~~{]}~ifte~;
|
||
|
\end{lyxcode}
|
||
|
Note the use of \texttt{2dup ( x y -{}- x y x y )}. Since \texttt{=}
|
||
|
consumes both its parameters, we must make copies of them to pass
|
||
|
to \texttt{correct} and \texttt{inexact-guess}. Try the following
|
||
|
at the interpreter to see what's going on:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
clear~1~2~2dup~=~.s
|
||
|
|
||
|
\emph{\{~1~2~f~\}}
|
||
|
|
||
|
clear~4~4~2dup~=~.s
|
||
|
|
||
|
\emph{\{~4~4~t~\}}
|
||
|
\end{lyxcode}
|
||
|
Test \texttt{judge-guess} with a few inputs:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
1~10~judge-guess~.
|
||
|
|
||
|
\emph{Too~low}
|
||
|
|
||
|
\emph{t}
|
||
|
|
||
|
89~43~judge-guess~.
|
||
|
|
||
|
\emph{Too~high}
|
||
|
|
||
|
\emph{t}
|
||
|
|
||
|
64~64~judge-guess~.
|
||
|
|
||
|
\emph{Correct}
|
||
|
|
||
|
\emph{f}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Generating random numbers}
|
||
|
|
||
|
The \texttt{random-int} word \texttt{( min max -{}- n )} pushes a
|
||
|
random number in a specified range. The range is inclusive, so both
|
||
|
the minimum and maximum indexes are candidate random numbers. Use
|
||
|
\texttt{apropos.} to determine that this word is in the \texttt{random}
|
||
|
vocabulary. For the purposes of this game, random numbers will be
|
||
|
in the range of 0 to 100, so we can define a word that generates a
|
||
|
random number in the range of 0 to 100:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~number-to-guess~(~-{}-~n~)~0~100~random-int~;
|
||
|
\end{lyxcode}
|
||
|
Add the word definition to the source file, along with the appropriate
|
||
|
\texttt{USE:} statement. Load the source file in the interpreter,
|
||
|
and confirm that the word functions correctly, and that its stack
|
||
|
effect comment is accurate.
|
||
|
|
||
|
|
||
|
\subsection{The game loop}
|
||
|
|
||
|
The game loop consists of repeated calls to \texttt{guess-prompt},
|
||
|
\texttt{read-number} and \texttt{judge-guess}. If \texttt{judge-guess}
|
||
|
pushes \texttt{f}, the loop stops, otherwise it continues. This is
|
||
|
realized with a recursive implementation:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~numbers-game-loop~(~actual~-{}-~)
|
||
|
|
||
|
~~~~dup~guess-prompt~read-number~judge-guess~{[}
|
||
|
|
||
|
~~~~~~~~numbers-game-loop
|
||
|
|
||
|
~~~~{]}~{[}
|
||
|
|
||
|
~~~~~~~~drop
|
||
|
|
||
|
~~~~{]}~ifte~;
|
||
|
\end{lyxcode}
|
||
|
In Factor, tail-recursive words consume a bounded amount of call stack
|
||
|
space. This means you are free to pick recursion or iteration based
|
||
|
on their own merits when solving a problem. In many other languages,
|
||
|
the usefulness of recursion is severely limited by the lack of tail-recursive
|
||
|
call optimization.
|
||
|
|
||
|
|
||
|
\subsection{Finishing off}
|
||
|
|
||
|
The last task is to combine everything into the main \texttt{numbers-game}
|
||
|
word. This is easier than it seems:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~numbers-game~number-to-guess~numbers-game-loop~;
|
||
|
\end{lyxcode}
|
||
|
Try it out! Simply invoke the numbers-game word in the interpreter.
|
||
|
It should work flawlessly, assuming you tested each component of this
|
||
|
design incrementally!
|
||
|
|
||
|
|
||
|
\subsection{The complete program}
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
!~Numbers~game~example~\\
|
||
|
|
||
|
|
||
|
IN:~numbers-game
|
||
|
|
||
|
USE:~parser
|
||
|
|
||
|
USE:~stdio~\\
|
||
|
~\\
|
||
|
:~read-number~(~-{}-~n~)~read~parse-word~;~\\
|
||
|
~\\
|
||
|
:~guess-banner
|
||
|
|
||
|
~~~~{}``I'm~thinking~of~a~number~between~0~and~100.''~print~;
|
||
|
|
||
|
:~guess-prompt~{}``Enter~your~guess:~''~write~;
|
||
|
|
||
|
:~too-high~{}``Too~high''~print~;
|
||
|
|
||
|
:~too-low~{}``Too~low''~print~;
|
||
|
|
||
|
:~correct~{}``Correct~-~you~win!''~print~;~\\
|
||
|
~\\
|
||
|
:~inexact-guess~(~guess~actual~-{}-~)
|
||
|
|
||
|
~~~~~>~{[}~too-high~{]}~{[}~too-low~{]}~ifte~;~\\
|
||
|
~\\
|
||
|
:~judge-guess~(~actual~guess~-{}-~?~)
|
||
|
|
||
|
~~~~2dup~=~{[}
|
||
|
|
||
|
~~~~~~~~correct~f
|
||
|
|
||
|
~~~~{]}~{[}
|
||
|
|
||
|
~~~~~~~~inexact-guess~t
|
||
|
|
||
|
~~~~{]}~ifte~;~\\
|
||
|
~\\
|
||
|
:~number-to-guess~(~-{}-~n~)~0~100~random-int~;~\\
|
||
|
~\\
|
||
|
:~numbers-game-loop~(~actual~-{}-~)
|
||
|
|
||
|
~~~~dup~guess-prompt~read-number~judge-guess~{[}
|
||
|
|
||
|
~~~~~~~~numbers-game-loop
|
||
|
|
||
|
~~~~{]}~{[}
|
||
|
|
||
|
~~~~~~~~drop
|
||
|
|
||
|
~~~~{]}~ifte~;~\\
|
||
|
~\\
|
||
|
:~numbers-game~number-to-guess~numbers-game-loop~;
|
||
|
|
||
|
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\section{Lists}
|
||
|
|
||
|
A list is composed of a set of pairs; each pair holds a list element,
|
||
|
and a reference to the next pair. Lists have the following literal
|
||
|
syntax:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~{}``CEO''~5~{}``CFO''~-4~f~{]}
|
||
|
\end{lyxcode}
|
||
|
Before we continue, it is important to understand the role of data
|
||
|
types in Factor. Lets make a distinction between two categories of
|
||
|
data types:
|
||
|
|
||
|
\begin{itemize}
|
||
|
\item Representational type -- this refers to the form of the data in the
|
||
|
interpreter. Representational types include integers, strings, and
|
||
|
vectors. Representational types are checked at run time -- attempting
|
||
|
to multiply two strings, for example, will yield an error.
|
||
|
\item Intentional type -- this refers to the meaning of the data within
|
||
|
the problem domain. This could be a length measured in inches, or
|
||
|
a string naming a file, or a list of objects in a room in a game.
|
||
|
It is up to the programmer to check intentional types -- Factor won't
|
||
|
prevent you from adding two integers representing a distance and a
|
||
|
time, even though the result is meaningless.
|
||
|
\end{itemize}
|
||
|
|
||
|
\subsection{Cons cells}
|
||
|
|
||
|
It may surprise you that in Factor, \emph{lists are intentional types}.
|
||
|
This means that they are not an inherent feature of the interpreter;
|
||
|
rather, they are built from a simpler data type, the \emph{cons cell}.
|
||
|
|
||
|
A cons cell is an object that holds a reference to two other objects.
|
||
|
The order of the two objects matters -- the first is called the \emph{car},
|
||
|
the second is called the \emph{cdr}.
|
||
|
|
||
|
All words relating to cons cells and lists are found in the \texttt{lists}
|
||
|
vocabulary. The words \texttt{cons}, \texttt{car} and \texttt{cdr}%
|
||
|
\footnote{These infamous names originate from the Lisp language. Originally,
|
||
|
{}``Lisp'' stood for {}``List Processing''.%
|
||
|
} construct and deconstruct cons cells:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
1~2~cons~.
|
||
|
|
||
|
\emph{{[}~1~|~2~{]}}
|
||
|
|
||
|
3~4~car~.
|
||
|
|
||
|
\emph{3}
|
||
|
|
||
|
5~6~cdr~.
|
||
|
|
||
|
\emph{6}
|
||
|
\end{lyxcode}
|
||
|
The output of the first expression suggests a literal syntax for cons
|
||
|
cells:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~10~|~20~{]}~cdr~.
|
||
|
|
||
|
\emph{20}
|
||
|
|
||
|
{[}~{}``first''~|~{[}~{}``second''~|~f~{]}~{]}~car~.
|
||
|
|
||
|
\emph{{}``first''}
|
||
|
|
||
|
{[}~{}``first''~|~{[}~{}``second''~|~f~{]}~{]}~cdr~car~.
|
||
|
|
||
|
\emph{{}``second''}
|
||
|
\end{lyxcode}
|
||
|
The last two examples make it clear how nested cons cells represent
|
||
|
a list. Since this {}``nested cons cell'' syntax is extremely cumbersome,
|
||
|
the parser provides an easier way:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~1~2~3~4~{]}~cdr~cdr~car~.
|
||
|
|
||
|
\emph{3}
|
||
|
\end{lyxcode}
|
||
|
A \emph{generalized list} is a set of cons cells linked by their cdr.
|
||
|
A \emph{proper list}, or just list, is a generalized list with a cdr
|
||
|
equal to f, the list is a proper list. Also, the object \texttt{f}
|
||
|
is a proper list, and in fact it is equivalent to the empty list \texttt{{[}
|
||
|
{]}}. An \emph{improper list} is a generalized list that is not a
|
||
|
proper list.
|
||
|
|
||
|
The \texttt{list?} word tests if the object at the top of the stack
|
||
|
is a proper list:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``hello''~list?~.
|
||
|
|
||
|
\emph{f}
|
||
|
|
||
|
{[}~{}``first''~{}``second''~|~{}``third''~{]}~list?~.
|
||
|
|
||
|
\emph{f}
|
||
|
|
||
|
{[}~{}``first''~{}``second''~{}``third''~{]}~list?~.
|
||
|
|
||
|
\emph{t}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Working with lists}
|
||
|
|
||
|
Unless otherwise documented, list manipulation words expect proper
|
||
|
lists as arguments. Given an improper list, they will either raise
|
||
|
an error, or disregard the hanging cdr at the end of the list.
|
||
|
|
||
|
Also unless otherwise documented, list manipulation words return newly-created
|
||
|
lists only. The original parameters are not modified. This may seem
|
||
|
inefficient, however the absence of side effects makes code much easier
|
||
|
to test and debug.%
|
||
|
\footnote{Side effect-free code is the fundamental idea underlying functional
|
||
|
programming languages. While Factor allows side effects and is not
|
||
|
a functional programming language, for a lot of problems, coding in
|
||
|
a functional style gives the most maintainable and readable results.%
|
||
|
} Where performance is important, a set of {}``destructive'' words
|
||
|
is provided. They are documented in the next section.
|
||
|
|
||
|
\texttt{add ( list obj -{}- list )} Create a new list consisting of
|
||
|
the original list, and a new element added at the end:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~1~2~3~{]}~4~add~.
|
||
|
|
||
|
\emph{{[}~1~2~3~4~{]}}
|
||
|
|
||
|
1~{[}~2~3~4~{]}~cons~.
|
||
|
|
||
|
\emph{{[}~1~2~3~4~{]}}
|
||
|
\end{lyxcode}
|
||
|
While \texttt{cons} and \texttt{add} appear to have similar effects,
|
||
|
they are quite different -- \texttt{cons} is a very cheap operation,
|
||
|
while \texttt{add} has to copy the entire list first! If you need
|
||
|
adds to the end to take a constant time, use a vector.
|
||
|
|
||
|
\texttt{append ( list list -{}- list )} Append the two lists at the
|
||
|
top of the stack:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~1~2~3~{]}~{[}~4~5~6~{]}~append~.
|
||
|
|
||
|
\emph{{[}~1~2~3~4~5~6~{]}}
|
||
|
|
||
|
{[}~1~2~3~{]}~dup~{[}~4~5~6~{]}~append~.s
|
||
|
|
||
|
\emph{\{~{[}~1~2~3~{]}~{[}~1~2~3~4~5~6~{]}~\}}
|
||
|
\end{lyxcode}
|
||
|
The first list is copied, and the cdr of its last cons cell is set
|
||
|
to the second list. The second example above shows that the original
|
||
|
parameter was not modified. Interestingly, if the second parameter
|
||
|
is not a proper list, \texttt{append} returns an improper list:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~1~2~3~{]}~4~append~.
|
||
|
|
||
|
\emph{{[}~1~2~3~|~4~{]}}
|
||
|
\end{lyxcode}
|
||
|
\texttt{length ( list -{}- n )} Iterate down the cdr of the list until
|
||
|
it reaches \texttt{f}, counting the number of elements in the list:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~{[}~1~2~{]}~{[}~3~4~{]}~5~{]}~length~.
|
||
|
|
||
|
\emph{3}
|
||
|
|
||
|
{[}~{[}~{[}~{}``Hey''~{]}~5~{]}~length~.
|
||
|
|
||
|
\emph{2}
|
||
|
\end{lyxcode}
|
||
|
\texttt{nth ( index list -{}- obj )} Look up an element specified
|
||
|
by a zero-based index, by successively iterating down the cdr of the
|
||
|
list:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
1~{[}~{}``Hamster''~{}``Bagpipe''~{}``Beam''~{]}~nth~.
|
||
|
|
||
|
\emph{{}``Bagpipe''}
|
||
|
\end{lyxcode}
|
||
|
This word takes linear time proportional to the list index. If you
|
||
|
need constant time lookups, use a vector instead.
|
||
|
|
||
|
\texttt{set-nth ( value index list -{}- list )} Create a new list,
|
||
|
identical to the original list except the element at the specified
|
||
|
index is replaced:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``Done''~1~{[}~{}``Not~started''~{}``Incomplete''~{]}~set-nth~.
|
||
|
|
||
|
\emph{{[}~{}``Done''~{}``Incomplete''~{]}}
|
||
|
\end{lyxcode}
|
||
|
\texttt{remove ( obj list -{}- list )} Push a new list, with all occurrences
|
||
|
of the object removed. All other elements are in the same order:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~australia-~{}``Australia''~swap~remove~;
|
||
|
|
||
|
{[}~\char`\"{}Canada\char`\"{}~\char`\"{}New~Zealand\char`\"{}~\char`\"{}Australia\char`\"{}~\char`\"{}Russia\char`\"{}~{]}~australia-~.
|
||
|
|
||
|
\emph{{[}~\char`\"{}Canada\char`\"{}~\char`\"{}New~Zealand\char`\"{}~\char`\"{}Russia\char`\"{}~{]}}
|
||
|
\end{lyxcode}
|
||
|
\texttt{remove-nth ( index list -{}- list )} Push a new list, with
|
||
|
an index removed:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~australia-~{}``Australia''~swap~remove~;
|
||
|
|
||
|
{[}~\char`\"{}Canada\char`\"{}~\char`\"{}New~Zealand\char`\"{}~\char`\"{}Australia\char`\"{}~\char`\"{}Russia\char`\"{}~{]}~australia-~.
|
||
|
|
||
|
\emph{{[}~\char`\"{}Canada\char`\"{}~\char`\"{}New~Zealand\char`\"{}~\char`\"{}Russia\char`\"{}~{]}}
|
||
|
\end{lyxcode}
|
||
|
\texttt{reverse ( list -{}- list )} Push a new list which has the
|
||
|
same elements as the original one, but in reverse order:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~4~3~2~1~{]}~reverse~.
|
||
|
|
||
|
\emph{{[}~1~2~3~4~{]}}
|
||
|
\end{lyxcode}
|
||
|
\texttt{contains ( obj list -{}- list )} Look for an occurrence of
|
||
|
an object in a list. The remainder of the list starting from the first
|
||
|
occurrence is returned. If the object does not occur in the list,
|
||
|
f is returned:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~lived-in?~(~country~-{}-~?~)
|
||
|
|
||
|
~~~~{[}~{}``Canada''~{}``New~Zealand''~{}``Australia''~{}``Russia''~{]}~contains~;
|
||
|
|
||
|
{}``Australia''~lived-in?~.
|
||
|
|
||
|
\emph{{[}~{}``Australia''~{}``Russia''~{]}}
|
||
|
|
||
|
{}``Pakistan''~lived-in?~.
|
||
|
|
||
|
\emph{f}
|
||
|
\end{lyxcode}
|
||
|
For now, assume {}``occurs'' means {}``contains an object that
|
||
|
looks like''. The issue of object equality is covered in the next
|
||
|
chapter.
|
||
|
|
||
|
\texttt{unique ( list -{}- list )} Return a new list with all duplicate
|
||
|
elements removed. This word executes in quadratic time, so should
|
||
|
not be used with large lists. For example:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~1~2~1~4~1~8~{]}~unique~.
|
||
|
|
||
|
\emph{{[}~1~2~4~8~{]}}
|
||
|
\end{lyxcode}
|
||
|
\texttt{unit ( obj -{}- list )} Make a list of one element:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``Unit~18''~unit~.
|
||
|
|
||
|
\emph{{[}~{}``Unit~18''~{]}}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Association lists}
|
||
|
|
||
|
An \emph{association list} is one where every element is a cons. The
|
||
|
car of each cons is a name, the cdr is a value. The literal notation
|
||
|
is suggestive:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}
|
||
|
|
||
|
~~~~{[}~{}``Jill''~|~{}``CEO''~{]}
|
||
|
|
||
|
~~~~{[}~{}``Jeff''~|~{}``manager''~{]}
|
||
|
|
||
|
~~~~{[}~{}``James~~|~{}``lowly~web~designer''~{]}
|
||
|
|
||
|
{]}
|
||
|
\end{lyxcode}
|
||
|
\texttt{assoc? ( obj -{}- ? )} returns \texttt{t} if the object is
|
||
|
a list whose every element is a cons; otherwise it returns \texttt{f}.
|
||
|
|
||
|
\texttt{assoc ( name alist -{}- value )} looks for a pair with this
|
||
|
name in the list, and pushes the cdr of the pair. Pushes f if no name
|
||
|
with this pair is present. Note that assoc cannot differentiate between
|
||
|
a name that is not present at all, or a name with a value of \texttt{f}.
|
||
|
|
||
|
\texttt{assoc{*} ( name alist -{}- {[} name | value {]} )} looks for
|
||
|
a pair with this name, and pushes the pair itself. Unlike \texttt{assoc},
|
||
|
\texttt{assoc{*}} returns different values in the cases of a value
|
||
|
set to \texttt{f}, or an undefined value.
|
||
|
|
||
|
\texttt{set-assoc ( value name alist -{}- alist )} removes any existing
|
||
|
occurrence of a name from the list, and adds a new pair. This creates
|
||
|
a new list, the original is unaffected.
|
||
|
|
||
|
\texttt{acons ( value name alist -{}- alist )} is slightly faster
|
||
|
than \texttt{set-assoc} since it simply conses a new pair onto the
|
||
|
list. However, if used repeatedly, the list will grow to contain a
|
||
|
lot of {}``shadowed'' pairs.
|
||
|
|
||
|
Searching an association list incurs a linear time cost, so they should
|
||
|
only be used for small mappings -- a typical use is a mapping of half
|
||
|
a dozen entries or so, specified literally in source. Hashtables can
|
||
|
achieve better performance with larger mappings.
|
||
|
|
||
|
|
||
|
\subsection{List combinators}
|
||
|
|
||
|
In a traditional language such as C, every iteration or collection
|
||
|
must be written out as a loop, with setting up and updating of indexes,
|
||
|
etc. Factor on the other hand relies on combinators and quotations
|
||
|
to avoid duplicating these loop ``design patterns'' throughout
|
||
|
the code.
|
||
|
|
||
|
The simplest case is iterating through each element of a list, and
|
||
|
printing it or otherwise consuming it from the stack.
|
||
|
|
||
|
\texttt{each ( list quot -{}- )} pushes each element of the list in
|
||
|
turn, and executes the quotation. The list and quotation are not on
|
||
|
the stack when the quotation is executed. This allows a powerful idiom
|
||
|
where the quotation makes a copy of a value on the stack, and consumes
|
||
|
it along with the list element. In fact, this idiom works with all
|
||
|
well-designed combinators.%
|
||
|
\footnote{Later, you will learn how to apply it when designing your own combinators.%
|
||
|
}
|
||
|
|
||
|
The previously-mentioned \texttt{reverse} word is implemented using
|
||
|
\texttt{each}:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~reverse~{[}~{]}~swap~{[}~swons~{]}~each~;
|
||
|
\end{lyxcode}
|
||
|
To understand how it works, consider that each element of the original
|
||
|
list is consed onto the beginning of a new list, in turn. So the last
|
||
|
element of the original list ends up at the beginning of the new list.
|
||
|
|
||
|
\texttt{inject ( list quot -{}- list )} is similar to \texttt{each},
|
||
|
except the return values of the quotation are collected into the new
|
||
|
list. The quotation must leave one more element on the stack than
|
||
|
was present before the quotation was called, otherwise the combinator
|
||
|
will not function properly; so the quotation must have stack effect
|
||
|
\texttt{( obj -{}- obj )}.
|
||
|
|
||
|
For example, suppose we have a list where each element stores the
|
||
|
quantity of a some nutrient in 100 grams of food; we would like to
|
||
|
find out the total nutrients contained in 300 grams:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~multiply-each~(~n~list~-{}-~list~)
|
||
|
|
||
|
~~~~{[}~dupd~{*}~{]}~inject~nip~;
|
||
|
|
||
|
3~{[}~50~450~101~{]}~multiply-each~.
|
||
|
|
||
|
\emph{{[}~180~1350~303~{]}}
|
||
|
\end{lyxcode}
|
||
|
Note the use of \texttt{nip} to discard the original parameter \texttt{n}.
|
||
|
|
||
|
In case there is no appropriate combinator, recursion can be used.
|
||
|
Factor performs tail call optimization, so a word where the recursive
|
||
|
call is the last thing done will not use an arbitrary amount of stack
|
||
|
space.
|
||
|
|
||
|
\texttt{subset ( list quot -{}- list )} produces a new list containing
|
||
|
some of the elements of the original list. Which elements to collect
|
||
|
is determined by the quotation -- the quotation is called with each
|
||
|
list element on the stack in turn, and those elements for which the
|
||
|
quotation does not return \texttt{f} are added to the new list. The
|
||
|
quotation must have stack effect \texttt{( obj -{}- ? )}.
|
||
|
|
||
|
For example, lets construct a list of all numbers between 0 and 99
|
||
|
such that the sum of their digits is less than 10:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~sum-of-digits~(~n~-{}-~n~)~10~/mod~+~;
|
||
|
|
||
|
100~count~{[}~sum-of-digits~10~<~{]}~subset~.
|
||
|
|
||
|
\emph{{[}~0~1~2~3~4~5~6~7~8~9~10~11~12~13~14~15~16~17~18~20~21}
|
||
|
|
||
|
\emph{22~23~24~25~26~27~30~31~32~33~34~35~36~40~41~42~43~44}
|
||
|
|
||
|
\emph{45~50~51~52~53~54~60~61~62~63~70~71~72~80~81~90~{]}~}
|
||
|
\end{lyxcode}
|
||
|
\texttt{all? ( list quot -{}- ? )} returns \texttt{t} if the quotation
|
||
|
returns \texttt{t} for all elements of the list, otherwise it returns
|
||
|
\texttt{f}. In other words, if \texttt{all?} returns \texttt{t}, then
|
||
|
\texttt{subset} applied to the same list and quotation would return
|
||
|
the entire list.%
|
||
|
\footnote{Barring any side effects which modify the execution of the quotation.
|
||
|
It is best to avoid side effects when using list combinators.%
|
||
|
}
|
||
|
|
||
|
For example, the implementation of \texttt{assoc?} uses \texttt{all?}:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~assoc?~(~list~-{}-~?~)
|
||
|
|
||
|
~~~~dup~list?~{[}~{[}~cons?~{]}~all?~{]}~{[}~drop~f~{]}~ifte~;
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{\label{sub:List-constructors}List constructors}
|
||
|
|
||
|
The list construction words minimize stack noise with a clever trick.
|
||
|
They store a partial list in a variable, thus reducing the number
|
||
|
of stack elements that have to be juggled.
|
||
|
|
||
|
The word \texttt{{[}, ( -{}- )} begins list construction.
|
||
|
|
||
|
The word \texttt{, ( obj -{}- )} appends an object to the partial
|
||
|
list.
|
||
|
|
||
|
The word \texttt{,{]} ( -{}- list )} pushes the complete list.
|
||
|
|
||
|
While variables haven't been described yet, keep in mind that a new
|
||
|
scope is created between \texttt{{[},} and \texttt{,{]}}. This means
|
||
|
that list constructions can be nested, as long as in the end, the
|
||
|
number of \texttt{{[},} and \texttt{,{]}} balances out. There is no
|
||
|
requirement that \texttt{{[},} and \texttt{,{]}} appear in the same
|
||
|
word, however, debugging becomes prohibitively difficult when a list
|
||
|
construction begins in one word and ends with another.
|
||
|
|
||
|
Here is an example of list construction using this technique:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[},~1~10~{[}~2~{*}~dup~,~{]}~times~drop~,{]}~.
|
||
|
|
||
|
\emph{{[}~2~4~8~16~32~64~128~256~512~1024~{]}}
|
||
|
|
||
|
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Destructively modifying lists}
|
||
|
|
||
|
All previously discussed list modification functions always returned
|
||
|
newly-allocated lists. Destructive list manipulation functions on
|
||
|
the other hand reuse the cons cells of their input lists, and hence
|
||
|
avoid memory allocation.
|
||
|
|
||
|
Only ever destructively change lists you do not intend to reuse again.
|
||
|
You should not rely on the side effects -- they are unpredictable.
|
||
|
It is wrong to think that destructive words {}``modify'' the original
|
||
|
list -- rather, think of them as returning a new list, just like the
|
||
|
normal versions of the words, with the added caveat that the original
|
||
|
list must not be used again.
|
||
|
|
||
|
\texttt{nreverse ( list -{}- list )} reverses a list without consing.
|
||
|
In the following example, the return value reuses the cons cells of
|
||
|
the original list, and the original list has been ruined by unpredictable
|
||
|
side effects:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~1~2~3~4~{]}~dup~nreverse~.s
|
||
|
|
||
|
\emph{\{~{[}~4~{]}~{[}~4~3~2~1~{]}~\}}
|
||
|
\end{lyxcode}
|
||
|
Compare the second stack element (which is what remains of the original
|
||
|
list) and the top stack element (the list returned by \texttt{nreverse}).
|
||
|
|
||
|
The \texttt{nreverse} word is the most frequently used destructive
|
||
|
list manipulator. The usual idiom is a loop where values are consed
|
||
|
onto the beginning of a list in each iteration of a loop, then the
|
||
|
list is reversed at the end. Since the original list is never used
|
||
|
again, \texttt{nreverse} can safely be used here.
|
||
|
|
||
|
\texttt{nappend ( list list -{}- list )} sets the cdr of the last
|
||
|
cons cell in the first list to the second list, unless the first list
|
||
|
is \texttt{f}, in which case it simply returns the second list. Again,
|
||
|
the side effects on the first list are unpredictable -- if it is \texttt{f},
|
||
|
it is unchanged, otherwise, it is equal to the return value:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{[}~1~2~{]}~{[}~3~4~{]}~nappend~.
|
||
|
|
||
|
\emph{{[}~1~2~3~4~{]}}
|
||
|
\end{lyxcode}
|
||
|
Note in the above examples, we use literal list parameters to nreverse
|
||
|
and nappend. This is actually a very bad idea, since the same literal
|
||
|
list may be used more than once! For example, lets make a colon definition:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~very-bad-idea~{[}~1~2~3~4~{]}~nreverse~;
|
||
|
|
||
|
very-bad-idea~.
|
||
|
|
||
|
\emph{{[}~4~3~2~1~{]}}
|
||
|
|
||
|
very-bad-idea~.
|
||
|
|
||
|
\emph{{[}~4~{]}}
|
||
|
|
||
|
{}``very-bad-idea''~see
|
||
|
|
||
|
\emph{:~very-bad-idea}
|
||
|
|
||
|
~\emph{~~~{[}~4~{]}~nreverse~;}
|
||
|
\end{lyxcode}
|
||
|
As you can see, the word definition itself was ruined!
|
||
|
|
||
|
Sometimes it is desirable make a copy of a list, so that the copy
|
||
|
may be safely side-effected later.
|
||
|
|
||
|
\texttt{clone-list ( list -{}- list )} pushes a new list containing
|
||
|
the exact same elements as the original. The elements themselves are
|
||
|
not copied.
|
||
|
|
||
|
If you want to write your own destructive list manipulation words,
|
||
|
you can use \texttt{set-car ( value cons -{}- )} and \texttt{set-cdr
|
||
|
( value cons -{}- )} to modify individual cons cells. Some words that
|
||
|
are not destructive on their inputs nonetheless create intermediate
|
||
|
lists which are operated on using these words. One example is \texttt{clone-list}
|
||
|
itself.
|
||
|
|
||
|
|
||
|
\section{Vectors}
|
||
|
|
||
|
A vector is a contiguous chunk of cells which hold references to arbitrary
|
||
|
objects. Vectors have the following literal syntax:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
\{~f~f~f~t~t~f~t~t~-6~{}``Hey''~\}
|
||
|
\end{lyxcode}
|
||
|
Use of vector literals in source code is discouraged, since vector
|
||
|
manipulation relies on side effects rather than return values, and
|
||
|
hence it is very easy to mess up a literal embedded in a word definition.
|
||
|
|
||
|
|
||
|
\subsection{Vectors versus lists}
|
||
|
|
||
|
Vectors are applicable to a different class of problems than lists.
|
||
|
Compare the relative performance of common operations on vectors and
|
||
|
lists:
|
||
|
|
||
|
\begin{tabular}{|c|c|c|}
|
||
|
\hline
|
||
|
&
|
||
|
Lists&
|
||
|
Vectors\tabularnewline
|
||
|
\hline
|
||
|
\hline
|
||
|
Random access of an index&
|
||
|
linear time&
|
||
|
constant time\tabularnewline
|
||
|
\hline
|
||
|
Add new element at start&
|
||
|
constant time&
|
||
|
linear time\tabularnewline
|
||
|
\hline
|
||
|
Add new element at end&
|
||
|
linear time&
|
||
|
constant time\tabularnewline
|
||
|
\hline
|
||
|
\end{tabular}
|
||
|
|
||
|
When using vectors, you need to pass around a vector and an index
|
||
|
-- when working with lists, often only a list head is passed around.
|
||
|
For this reason, if you need a sequence for iteration only, a list
|
||
|
is a better choice because the list vocabulary contains a rich collection
|
||
|
of recursive words.
|
||
|
|
||
|
On the other hand, when you need to maintain your own {}``stack''-like
|
||
|
collection, a vector is the obvious choice, since most pushes and
|
||
|
pops can then avoid allocating memory.
|
||
|
|
||
|
Vectors and lists can be converted back and forth using the \texttt{vector>list}
|
||
|
word \texttt{( vector -{}- list )} and the \texttt{list>vector} word
|
||
|
\texttt{( list -{}- vector )}.
|
||
|
|
||
|
|
||
|
\subsection{Working with vectors}
|
||
|
|
||
|
\texttt{<vector> ( capacity -{}- vector )} pushes a zero-length vector.
|
||
|
Storing more elements than the initial capacity grows the vector.
|
||
|
|
||
|
\texttt{vector-nth ( index vector -{}- obj )} pushes the object stored
|
||
|
at a zero-based index of a vector:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
0~\{~{}``zero''~{}``one''~\}~vector-nth~.
|
||
|
|
||
|
\emph{{}``zero''}
|
||
|
|
||
|
2~\{~1~2~\}~vector-nth~.
|
||
|
|
||
|
\emph{ERROR:~Out~of~bounds}
|
||
|
\end{lyxcode}
|
||
|
\texttt{set-vector-nth ( obj index vector -{}- )} stores a value into
|
||
|
a vector:%
|
||
|
\footnote{The words \texttt{get} and \texttt{set} used in this example will
|
||
|
be formally introduced later.%
|
||
|
}
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
\{~{}``math''~{}``CS''~\}~{}``v''~set
|
||
|
|
||
|
1~{}``philosophy''~{}``v''~get~set-vector-nth
|
||
|
|
||
|
{}``v''~get~.
|
||
|
|
||
|
\emph{\{~{}``math''~{}``philosophy''~\}}
|
||
|
|
||
|
4~{}``CS''~{}``v''~get~set-vector-nth
|
||
|
|
||
|
{}``v''~get~.
|
||
|
|
||
|
\emph{\{~{}``math''~{}``philosophy''~f~f~{}``CS''~\}}
|
||
|
\end{lyxcode}
|
||
|
\texttt{vector-length ( vector -{}- length )} pushes the number of
|
||
|
elements in a vector. As the previous two examples demonstrate, attempting
|
||
|
to fetch beyond the end of the vector will raise an error, while storing
|
||
|
beyond the end will grow the vector as necessary.
|
||
|
|
||
|
\texttt{set-vector-length ( length vector -{}- )} resizes a vector.
|
||
|
If the new length is larger than the current length, the vector grows
|
||
|
if necessary, and the new cells are filled with \texttt{f}.
|
||
|
|
||
|
\texttt{vector-push ( obj vector -{}- )} adds an object at the end
|
||
|
of the vector. This increments the vector's length by one.
|
||
|
|
||
|
\texttt{vector-pop ( vector -{}- obj )} removes the object at the
|
||
|
end of the vector and pushes it. This decrements the vector's length
|
||
|
by one.
|
||
|
|
||
|
|
||
|
\subsection{Vector combinators}
|
||
|
|
||
|
vector-each, vector-map
|
||
|
|
||
|
|
||
|
\section{Strings}
|
||
|
|
||
|
A \emph{string} is a sequence of 16-bit Unicode characters (conventionally,
|
||
|
in the UTF16 encoding). Strings are input by enclosing them in quotes:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``GET~/index.html~HTTP/1.0''
|
||
|
\end{lyxcode}
|
||
|
String literals must not span more than one line. The following is
|
||
|
not valid:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``Content-Type:~text/html
|
||
|
|
||
|
Content-Length:~1280''
|
||
|
\end{lyxcode}
|
||
|
Instead, the newline must be represented using an escape, rather than
|
||
|
literally. The newline escape is \texttt{\textbackslash{}n}, so we
|
||
|
can write:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``Content-Type:~text/html\textbackslash{}nContent-Length:~1280''
|
||
|
\end{lyxcode}
|
||
|
Other special characters, such as quotes and tabs can be input in
|
||
|
a similar manner. Here is the full list of supported character escapes:
|
||
|
|
||
|
\begin{tabular}{|c|c|}
|
||
|
\hline
|
||
|
Character&
|
||
|
Escape\tabularnewline
|
||
|
\hline
|
||
|
\hline
|
||
|
Quote&
|
||
|
\texttt{\textbackslash{}''}\tabularnewline
|
||
|
\hline
|
||
|
Newline&
|
||
|
\texttt{\textbackslash{}n}\tabularnewline
|
||
|
\hline
|
||
|
Carriage return&
|
||
|
\texttt{\textbackslash{}r}\tabularnewline
|
||
|
\hline
|
||
|
Horizontal tab&
|
||
|
\texttt{\textbackslash{}t}\tabularnewline
|
||
|
\hline
|
||
|
Terminal escape&
|
||
|
\texttt{\textbackslash{}e}\tabularnewline
|
||
|
\hline
|
||
|
Zero chacater&
|
||
|
\texttt{\textbackslash{}0}\tabularnewline
|
||
|
\hline
|
||
|
Arbitrary Unicode character&
|
||
|
\texttt{\textbackslash{}u}\texttt{\emph{nnnn}}\tabularnewline
|
||
|
\hline
|
||
|
\end{tabular}
|
||
|
|
||
|
The last row shows a notation for inputting any possible character
|
||
|
using its hexadecimal value. For example, a space character can also
|
||
|
be input as \texttt{\textbackslash{}u0020}.
|
||
|
|
||
|
There is no specific character data type in Factor. When characters
|
||
|
are extracted from a string, they are pushed on the stack as integers.
|
||
|
It is possible to input an integer with a value equal to that of a
|
||
|
Unicode character using the following special notation:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
CHAR:~A~.
|
||
|
|
||
|
\emph{65}
|
||
|
|
||
|
CHAR:~A~1~+~CHAR:~B~=~.
|
||
|
|
||
|
\emph{t}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Working with strings}
|
||
|
|
||
|
String words are found in the \texttt{strings} vocabulary. String
|
||
|
manipulation words always return a new copy of a string rather than
|
||
|
modifying the string in-place. Notice the absence of words such as
|
||
|
\texttt{set-str-nth} and \texttt{set-str-length}. Unlike lists, for
|
||
|
which both constructive and destuctive manipulation words are provided,
|
||
|
destructive string operations are only done with a distinct string
|
||
|
buffer type, which is described in the next section.
|
||
|
|
||
|
\texttt{str-length ( str -{}- n )} pushes the length of a string:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``Factor''~str-length~.
|
||
|
|
||
|
\emph{6}
|
||
|
\end{lyxcode}
|
||
|
\texttt{str-nth ( n str -{}- ch )} pushes the character located by
|
||
|
a zero-based index. A string is essentially a vector specialized for
|
||
|
storing one data type, the 16-bit unsigned character. These are returned
|
||
|
as fixnums, so printing will not yield the actual character:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
0~{}``~{}``~str-nth~.
|
||
|
|
||
|
\emph{32}
|
||
|
\end{lyxcode}
|
||
|
\texttt{index-of ( str substr -{}- n )} searches a string for the
|
||
|
first occurrence of a substring or character. If an occurrence was
|
||
|
found, its index is pushed. Otherwise, -1 is pushed:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``www.sun.com''~CHAR:~.~index-of~.
|
||
|
|
||
|
\emph{3}
|
||
|
|
||
|
{}``mailto:billg@microsoft.com''~CHAR:~/~index-of~.
|
||
|
|
||
|
\emph{-1}
|
||
|
|
||
|
{}``www.lispworks.com''~{}``.com''~index-of~.
|
||
|
|
||
|
\emph{13}
|
||
|
\end{lyxcode}
|
||
|
\texttt{index-of{*} ( n str substr -{}- n )} works like index-of,
|
||
|
except it takes a start index as an argument.
|
||
|
|
||
|
\texttt{substring ( start end str -{}- substr )} extracts a range
|
||
|
of characters from a string into a new string.
|
||
|
|
||
|
\texttt{split ( str split -{}- list )} pushes a new list of strings
|
||
|
which are substrings of the original string, taken in between occurrences
|
||
|
of the split string:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``fixnum~bignum~ratio''~{}``~''~split~.
|
||
|
|
||
|
\emph{{[}~{}``fixnum''~{}``bignum''~{}``ratio''~{]}}
|
||
|
|
||
|
{}``/usr/bin/X''~CHAR:~/~split~.
|
||
|
|
||
|
\emph{{[}~{}``''~{}``usr''~{}``bin''~{}``X''~{]}}
|
||
|
\end{lyxcode}
|
||
|
If you wish to concatenate a fixed number of strings at the top of
|
||
|
the stack, you can use a member of the \texttt{cat} family of words
|
||
|
from the \texttt{strings} vocabulary. They concatenate strings, in
|
||
|
the order that they appear in the stack effect.
|
||
|
|
||
|
\begin{tabular}{|c|c|}
|
||
|
\hline
|
||
|
Word&
|
||
|
Stack effect\tabularnewline
|
||
|
\hline
|
||
|
\hline
|
||
|
\texttt{cat2}&
|
||
|
\texttt{( s1 s2 -{}- str )}\tabularnewline
|
||
|
\hline
|
||
|
\texttt{cat3}&
|
||
|
\texttt{( s1 s2 s3 -{}- str )}\tabularnewline
|
||
|
\hline
|
||
|
\texttt{cat4}&
|
||
|
\texttt{( s1 s2 s3 s4 -{}- str )}\tabularnewline
|
||
|
\hline
|
||
|
\texttt{cat5}&
|
||
|
\texttt{( s1 s2 s3 s4 s5 -{}- str )}\tabularnewline
|
||
|
\hline
|
||
|
\end{tabular}
|
||
|
|
||
|
\texttt{cat ( list -{}- str )} is a generalization of the above words;
|
||
|
it concatenates each element of a list into a new string.
|
||
|
|
||
|
Some straightfoward examples:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``How~are~you,~{}``~{}``Chuck''~{}``?''~cat3~.
|
||
|
|
||
|
\emph{{}``How~are~you,~Chuck?''}
|
||
|
|
||
|
{}``/usr/bin/X''~CHAR:~/~split~cat~.
|
||
|
|
||
|
\emph{{}``usrbinX''}
|
||
|
\end{lyxcode}
|
||
|
String buffers, described in the next section, provide a more flexible
|
||
|
means of concatenating strings.
|
||
|
|
||
|
|
||
|
\subsection{String buffers}
|
||
|
|
||
|
A \emph{string buffer} is a mutable string. The canonical use for
|
||
|
a string buffer is to combine several strings into one. This is done
|
||
|
by creating a new string buffer, appending strings and characters,
|
||
|
and finally turning the string buffer into a string.
|
||
|
|
||
|
\texttt{<sbuf> ( capacity -{}- sbuf )} pushes a new string buffer
|
||
|
that is capable of holding up to the specified capacity before growing.
|
||
|
|
||
|
\texttt{sbuf-append ( str/ch sbuf -{}- )} appends a string or a character
|
||
|
to the end of the string buffer. If a number is given, its least significant
|
||
|
16 bits are interpreted as a character value:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
100~<sbuf>~{}``my-sbuf''~set
|
||
|
|
||
|
{}``Testing''~{}``my-sbuf''~get~sbuf-append
|
||
|
|
||
|
32~{}``my-sbuf''~get~sbuf-append
|
||
|
\end{lyxcode}
|
||
|
\texttt{sbuf>str ( sbuf -{}- str )} pushes a string with the same
|
||
|
contents as the string buffer:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``my-sbuf''~get~sbuf>str~.
|
||
|
|
||
|
{}``Testing~{}``
|
||
|
\end{lyxcode}
|
||
|
While usually string buffers are only used to concatenate a series
|
||
|
of strings, they also support the same operations as vectors.
|
||
|
|
||
|
\texttt{sbuf-nth ( n sbuf -{}- ch )} pushes the character stored at
|
||
|
a zero-based index of a string buffer:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
2~{}``A~string.''~str-nth~.
|
||
|
|
||
|
\emph{115}
|
||
|
\end{lyxcode}
|
||
|
\texttt{set-sbuf-nth ( ch n sbuf -{}- )} sets the character stored
|
||
|
at a zero-based index of a string buffer. Only the least significant
|
||
|
16 bits of the charcter are stored into the string buffer.
|
||
|
|
||
|
\texttt{sbuf-length ( sbuf -{}- n )} pushes the number of characters
|
||
|
in a string buffer. This is not the same as the capacity of the string
|
||
|
buffer -- the capacity is the internal storage size of the string
|
||
|
buffer, the length is a possibly smaller number indicating how much
|
||
|
storage is in use.
|
||
|
|
||
|
\texttt{set-sbuf-length ( n sbuf -{}- )} changes the length of the
|
||
|
string buffer. The string buffer's storage grows if necessary, and
|
||
|
new character positions are automatically filled with zeroes.
|
||
|
|
||
|
|
||
|
\subsection{String constructors}
|
||
|
|
||
|
Passing a string buffer on the stack can lead to unnecessary stack
|
||
|
noise, and overly-complicated stack effects. Often it is better to
|
||
|
use the string construction words, which operate on a similar principle
|
||
|
to the list construction words.
|
||
|
|
||
|
As seen in \ref{sub:List-constructors}, the \texttt{{[},} word begins
|
||
|
list construction; the \texttt{,} word appends elements to the list
|
||
|
that will be returned by the \texttt{,{]}} word. Similarly, the \texttt{<\%}
|
||
|
word begins string construction; the \texttt{\%} word appends the
|
||
|
top of the stack to the string that will be returned by the \texttt{\%>}
|
||
|
word.
|
||
|
|
||
|
The word \texttt{<\% ( -{}- )} begins string construction. The word
|
||
|
definition creates a string buffer. Instead of leaving the string
|
||
|
buffer on the stack, the word creates and pushes a scope on the name
|
||
|
stack.
|
||
|
|
||
|
The word \texttt{\% ( str/ch -{}- )} appends a string or a character
|
||
|
to the partial list. The word definition calls \texttt{sbuf-append}
|
||
|
on a string buffer located by searching the name stack.
|
||
|
|
||
|
The word \texttt{\%> ( -{}- str )} pushes the complete list. The word
|
||
|
definition pops the name stack and calls \texttt{sbuf>str} on the
|
||
|
appropriate string buffer.
|
||
|
|
||
|
TODO examples
|
||
|
|
||
|
|
||
|
\subsection{String combinators}
|
||
|
|
||
|
A pair of combinators in the \texttt{strings} vocabulary iterate over a string, applying a quotation to each character. The \texttt{str-each} word does nothing other than calling the quotation, while \texttt{str-map} collects the return values of the quotation into a new string.
|
||
|
|
||
|
\texttt{str-each ( str quot -{}- )} pushes each character of the string in turn, and executes the quotation. The quotation should have stack effect \texttt{( ch -- )}. The string and the quotation are not on the stack when the quotation is executed. This allows the quotation to use values below the string for accumilation and so on. The following example counts the number of occurrences of the letter ``a'' in a string:
|
||
|
|
||
|
\begin{alltt}
|
||
|
: count-a ( str -- n )
|
||
|
0 swap [ CHAR: a = [ succ ] when ] str-each ;
|
||
|
|
||
|
"Lets just say that you may stay" count-a .
|
||
|
\emph{4}
|
||
|
\end{alltt}
|
||
|
|
||
|
\texttt{str-map (str quot -{}- str )} pushes each character
|
||
|
\subsection{Printing and reading strings}
|
||
|
|
||
|
These words, found in the \texttt{stdio} vocabulary, differ from \texttt{.}
|
||
|
in that they print strings only, without surrounding quotes, and raise
|
||
|
an error for any other data type. The word \texttt{.} prints any Factor
|
||
|
object in a form suited for parsing, hence it quotes strings.
|
||
|
|
||
|
\texttt{write ( str -{}- )} writes a string to the standard output
|
||
|
device, without a terminating newline.
|
||
|
|
||
|
\texttt{read ( -{}- str )} reads a line of input from the standard
|
||
|
input device, terminated by a newline.
|
||
|
|
||
|
\texttt{print ( str -{}- )} writes a string followed by a newline
|
||
|
character. Instead of passinga blank string, use \texttt{terpri (
|
||
|
-{}- )} to print a single newline character.
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``a''~write~{}``b''~write
|
||
|
|
||
|
ab
|
||
|
|
||
|
{[}~{}``hello''~{}``world''~{]}~{[}~print~{]}~each
|
||
|
|
||
|
hello
|
||
|
|
||
|
world
|
||
|
\end{lyxcode}
|
||
|
Often a string representation of a number, usually one read from an
|
||
|
input source, needs to be turned into a number. Unlike some languages,
|
||
|
in Factor the conversion from a string such as {}``123'' into the
|
||
|
number 123 is not automatic. To turn a string into a number, use one
|
||
|
of two words in the \texttt{parser} vocabulary.
|
||
|
|
||
|
\texttt{str>number ( str -{}- n )} creates an integer, ratio or floating
|
||
|
point literal from its string representation. If the string does not
|
||
|
reprent a valid number, an exception is thrown.
|
||
|
|
||
|
\texttt{parse-number ( str -{}- n/f )} pushes f on failure, rather
|
||
|
than raising an exception.
|
||
|
|
||
|
XXX bad; talk about parse-word
|
||
|
|
||
|
\texttt{unparse ( n -{}- str )} pushes the string representation of
|
||
|
a number.
|
||
|
|
||
|
|
||
|
\section{PRACTICAL: Contractor timesheet}
|
||
|
|
||
|
|
||
|
\subsection{Adding a timesheet entry}
|
||
|
|
||
|
When you begin working on a new task, you tell the timesheet you want
|
||
|
to add a new entry. It then measures the elapsed time until you specify
|
||
|
the task is done, and prompts for a task description.
|
||
|
|
||
|
The first word we will write is \texttt{measure-duration}. We measure
|
||
|
the time duration by using the \texttt{millis} word \texttt{( -{}-
|
||
|
m )} to take the time before and after a call to \texttt{read}. The
|
||
|
\texttt{millis} word pushes the number of milliseconds since a certain
|
||
|
epoch -- the epoch does not matter here since we are only interested
|
||
|
in the difference between two times.
|
||
|
|
||
|
A first attempt at \texttt{measure-duration} might look like this:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~measure-duration~millis~read~drop~millis~-~;
|
||
|
|
||
|
measure-duration~.
|
||
|
\end{lyxcode}
|
||
|
This word definition has the right general idea, however, the result
|
||
|
is negative. Also, we would like to measure durations in minutes,
|
||
|
not milliseconds:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~measure-duration~(~-{}-~duration~)
|
||
|
|
||
|
~~~~millis
|
||
|
|
||
|
~~~~read~drop
|
||
|
|
||
|
~~~~millis~swap~-~1000~/i~60~/i~;
|
||
|
\end{lyxcode}
|
||
|
Note that the \texttt{/i} word \texttt{( x y -{}- x/y )}, from the
|
||
|
\texttt{arithmetic} vocabulary, performs truncating division. This
|
||
|
makes sense, since we are not interested in fractional parts of a
|
||
|
minute here.
|
||
|
|
||
|
Now that we can measure a time duration at the keyboard, lets write
|
||
|
the \texttt{add-entry-prompt} word. This word does exactly what one
|
||
|
would expect -- it prompts for the time duration and description,
|
||
|
and leaves those two values on the stack:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~add-entry-prompt~(~-{}-~duration~description~)
|
||
|
|
||
|
~~~~\char`\"{}Start~work~on~the~task~now.~Press~ENTER~when~done.\char`\"{}~print
|
||
|
|
||
|
~~~~measure-duration
|
||
|
|
||
|
~~~~\char`\"{}Please~enter~a~description:\char`\"{}~print
|
||
|
|
||
|
~~~~read~;
|
||
|
\end{lyxcode}
|
||
|
You should interactively test this word. Measure off a minute or two,
|
||
|
press ENTER, enter a description, and press ENTER again. The stack
|
||
|
should now contain two values, in the same order as the stack effect
|
||
|
comment.
|
||
|
|
||
|
Now, almost all the ingredients are in place. The final add-entry
|
||
|
word calls add-entry-prompt, then pushes the new entry on the end
|
||
|
of the timesheet vector:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~add-entry~(~timesheet~-{}-~)
|
||
|
|
||
|
~~~~add-entry-prompt~cons~swap~vector-push~;
|
||
|
\end{lyxcode}
|
||
|
Recall that timesheet entries are cons cells where the car is the
|
||
|
duration and the cdr is the description, hence the call to \texttt{cons}.
|
||
|
Note that this word side-effects the timesheet vector. You can test
|
||
|
it interactively like so:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
10~<vector>~dup~add-entry
|
||
|
|
||
|
\emph{Start~work~on~the~task~now.~Press~ENTER~when~done.}
|
||
|
|
||
|
\emph{Please~enter~a~description:}
|
||
|
|
||
|
\emph{Studying~Factor}
|
||
|
|
||
|
.
|
||
|
|
||
|
\emph{\{~{[}~2~|~{}``Studying~Factor''~{]}~\}}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{Printing the timesheet}
|
||
|
|
||
|
The hard part of printing the timesheet is turning the duration in
|
||
|
minutes into a nice hours/minutes string, like {}``01:15''. We would
|
||
|
like to make a word like the following:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
135~hh:mm~.
|
||
|
|
||
|
\emph{01:15}
|
||
|
\end{lyxcode}
|
||
|
First, we can make a pair of words hh and mm to extract the hours
|
||
|
and minutes, respectively. This can be achieved using truncating division,
|
||
|
and the modulo operator -- also, since we would like strings to be
|
||
|
returned, the \texttt{unparse} word \texttt{( obj -{}- str )} from
|
||
|
the \texttt{unparser} vocabulary is called to turn the integers into
|
||
|
strings:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~hh~(~duration~-{}-~str~)~60~/i~unparse~;
|
||
|
|
||
|
:~mm~(~duration~-{}-~str~)~60~mod~unparse~;
|
||
|
\end{lyxcode}
|
||
|
The \texttt{hh:mm} word can then be written, concatenating the return
|
||
|
values of \texttt{hh} and \texttt{mm} into a single string using string
|
||
|
construction:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~hh:mm~(~millis~-{}-~str~)~<\%~dup~hh~\%~\char`\"{}:\char`\"{}~\%~mm~\%~\%>~;
|
||
|
\end{lyxcode}
|
||
|
However, so far, these three definitions do not produce ideal output.
|
||
|
Try a few examples:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
120~hh:mm~.
|
||
|
|
||
|
2:0
|
||
|
|
||
|
130~hh:mm~.
|
||
|
|
||
|
2:10
|
||
|
\end{lyxcode}
|
||
|
Obviously, we would like the minutes to always be two digits. Luckily,
|
||
|
there is a \texttt{digits} word \texttt{( str n -{}- str )} in the
|
||
|
\texttt{format} vocabulary that adds enough zeros on the left of the
|
||
|
string to give it the specified length. Try it out:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
{}``23''~2~digits~.
|
||
|
|
||
|
\emph{{}``23''}
|
||
|
|
||
|
{}``7''~2~digits~.
|
||
|
|
||
|
\emph{{}``07''}
|
||
|
\end{lyxcode}
|
||
|
We can now change the definition of \texttt{mm} accordingly:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~mm~(~duration~-{}-~str~)~60~mod~unparse~2~digits~;
|
||
|
\end{lyxcode}
|
||
|
Now that time duration output is done, a first attempt at a definition
|
||
|
of \texttt{print-timesheet} looks like this:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~print-timesheet~(~timesheet~-{}-~)
|
||
|
|
||
|
~~~~{[}~uncons~write~{}``:~{}``~write~hh:mm~print~{]}~vector-each~;
|
||
|
\end{lyxcode}
|
||
|
This works, but produces ugly output:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
\{~{[}~30~|~{}``Studying~Factor''~{]}~{[}~65~|~{}``Paperwork''~{]}~\}
|
||
|
|
||
|
print-timesheet
|
||
|
|
||
|
\emph{Studying~Factor:~0:30}
|
||
|
|
||
|
\emph{Paperwork:~1:05}
|
||
|
\end{lyxcode}
|
||
|
It would be much nicer if the time durations lined up in the same
|
||
|
column. First, lets factor out the body of the \texttt{vector-each}
|
||
|
loop into a new \texttt{print-entry} word before it gets too long:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~print-entry~(~duration~description~-{}-~)
|
||
|
|
||
|
~~~~write~{}``:~''~write~hh:mm~print~;~\\
|
||
|
~\\
|
||
|
:~print-timesheet~(~timesheet~-{}-~)
|
||
|
|
||
|
~~~~{[}~uncons~print-entry~{]}~vector-each~;
|
||
|
\end{lyxcode}
|
||
|
We can now make \texttt{print-entry} line up columns using the \texttt{pad-string}
|
||
|
word \texttt{( str n -{}- str )}.
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
:~print-entry~(~duration~description~-{}-~)
|
||
|
|
||
|
~~~~dup
|
||
|
|
||
|
~~~~write
|
||
|
|
||
|
~~~~50~swap~pad-string~write~
|
||
|
|
||
|
~~~~hh:mm~print~;
|
||
|
\end{lyxcode}
|
||
|
In the above definition, we first print the description, then enough
|
||
|
blanks to move the cursor to column 60. So the description text is
|
||
|
left-justified. If we had interchanged the order of the second and
|
||
|
third line in the definition, the description text would be right-justified.
|
||
|
|
||
|
Try out \texttt{print-timesheet} again, and marvel at the aligned
|
||
|
columns:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
\{~{[}~30~|~{}``Studying~Factor''~{]}~{[}~65~|~{}``Paperwork''~{]}~\}
|
||
|
|
||
|
print-timesheet
|
||
|
|
||
|
\emph{Studying~Factor~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~0:30}
|
||
|
|
||
|
\emph{Paperwork~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~1:05}
|
||
|
\end{lyxcode}
|
||
|
|
||
|
\subsection{The main menu}
|
||
|
|
||
|
Reading a number, showing a menu
|
||
|
|
||
|
|
||
|
\section{Variables and namespaces}
|
||
|
|
||
|
|
||
|
\subsection{Hashtables}
|
||
|
|
||
|
|
||
|
\subsection{Namespaces}
|
||
|
|
||
|
|
||
|
\subsection{The name stack}
|
||
|
|
||
|
|
||
|
\subsection{The inspector}
|
||
|
|
||
|
|
||
|
\section{PRACTICAL: Music player}
|
||
|
|
||
|
|
||
|
\section{Deeper in the beast}
|
||
|
|
||
|
Text -> objects - parser, objects -> text - unparser for atoms, prettyprinter
|
||
|
for collections.
|
||
|
|
||
|
What really is a word -- primitive, parameter, property list.
|
||
|
|
||
|
Call stack how it works and >r/r>
|
||
|
|
||
|
|
||
|
\subsection{Parsing words}
|
||
|
|
||
|
Lets take a closer look at Factor syntax. Consider a simple expression,
|
||
|
and the result of evaluating it in the interactive interpreter:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
2~3~+~.
|
||
|
|
||
|
\emph{5}
|
||
|
\end{lyxcode}
|
||
|
The interactive interpreter is basically an infinite loop. It reads
|
||
|
a line of input from the terminal, parses this line to produce a \emph{quotation},
|
||
|
and executes the quotation.
|
||
|
|
||
|
In the parse step, the input text is tokenized into a sequence of
|
||
|
white space-separated tokens. First, the interpreter checks if there
|
||
|
is an existing word named by the token. If there is no such word,
|
||
|
the interpreter instead treats the token as a number.%
|
||
|
\footnote{Of course, Factor supports a full range of data types, including strings,
|
||
|
lists and vectors. Their source representations are still built from
|
||
|
numbers and words, however.%
|
||
|
}
|
||
|
|
||
|
Once the expression has been entirely parsed, the interactive interpreter
|
||
|
executes it.
|
||
|
|
||
|
This parse time/run time distinction is important, because words fall
|
||
|
into two categories; {}``parsing words'' and {}``running words''.
|
||
|
|
||
|
The parser constructs a parse tree from the input text. When the parser
|
||
|
encounters a token representing a number or an ordinary word, the
|
||
|
token is simply appended to the current parse tree node. A parsing
|
||
|
word on the other hand is executed \emph{}immediately after being
|
||
|
tokenized. Since it executes in the context of the parser, it has
|
||
|
access to the raw input text, the entire parse tree, and other parser
|
||
|
structures.
|
||
|
|
||
|
Parsing words are also defined using colon definitions, except we
|
||
|
add \texttt{parsing} after the terminating \texttt{;}. Here are two
|
||
|
examples of definitions for words \texttt{foo} and \texttt{bar}, both
|
||
|
are identical except in the second example, \texttt{foo} is defined
|
||
|
as a parsing word:
|
||
|
|
||
|
\begin{lyxcode}
|
||
|
!~Lets~define~'foo'~as~a~running~word.
|
||
|
|
||
|
:~foo~{}``1)~foo~executed.''~print~;
|
||
|
|
||
|
:~bar~foo~{}``2)~bar~executed.''~;
|
||
|
|
||
|
bar
|
||
|
|
||
|
\emph{1)~foo~executed}
|
||
|
|
||
|
\emph{2)~bar~executed}
|
||
|
|
||
|
bar
|
||
|
|
||
|
\emph{1)~foo~executed}
|
||
|
|
||
|
\emph{2)~bar~executed}
|
||
|
|
||
|
|
||
|
|
||
|
!~Now~lets~define~'foo'~as~a~parsing~word.
|
||
|
|
||
|
:~foo~{}``1)~foo~executed.''~print~;~parsing
|
||
|
|
||
|
:~bar~foo~{}``2)~bar~executed.''~;
|
||
|
|
||
|
\emph{1)~foo~executed}
|
||
|
|
||
|
bar
|
||
|
|
||
|
\emph{2)~bar~executed}
|
||
|
|
||
|
bar
|
||
|
|
||
|
\emph{2)~bar~executed}
|
||
|
\end{lyxcode}
|
||
|
In fact, the word \texttt{{}``} that denotes a string literal is
|
||
|
a parsing word -- it reads characters from the input text until the
|
||
|
next occurrence of \texttt{{}``}, and appends this string to the
|
||
|
current node of the parse tree. Note that strings and words are different
|
||
|
types of objects. Strings are covered in great detail later.
|
||
|
|
||
|
|
||
|
\section{PRACTICAL: Infix syntax}
|
||
|
|
||
|
|
||
|
\section{Continuations}
|
||
|
|
||
|
Generators, co-routines, multitasking, exception handling
|
||
|
|
||
|
|
||
|
\section{HTTP Server}
|
||
|
|
||
|
|
||
|
\section{PRACTICAL: Some web app}
|
||
|
\end{document}
|