factor/doc/devel-guide.lyx

4481 lines
76 KiB
Plaintext

#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass article
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\spacing single
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 2
\paragraph_separation skip
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle headings
\layout Title
Factor Developer's Guide
\layout Author
Slava Pestov
\layout Standard
\begin_inset LatexCommand \tableofcontents{}
\end_inset
\layout Section*
\pagebreak_top
Introduction
\layout Standard
Factor is an imperitive programming language with functional and object-oriented
influences.
Its primary goal is to be used for web-based server-side applications.
Factor is interpreted by a virtual machine that provides garbage collection
and prohibits pointer arithmetic.
\begin_inset Foot
collapsed false
\layout Standard
Two releases of Factor are available -- a virtual machine written in C,
and an interpreter written in Java that runs on the Java virtual machine.
This guide targets the C version of Factor.
\end_inset
\layout Standard
Factor borrows heavily from Forth, Joy and Lisp.
From Forth it inherits a flexible syntax defined in terms of
\begin_inset Quotes eld
\end_inset
parsing words
\begin_inset Quotes erd
\end_inset
and an execution model based on a data stack and call stack.
From Joy and Lisp it inherits a virtual machine prohibiting direct pointer
arithmetic, and the use of
\begin_inset Quotes eld
\end_inset
cons cells
\begin_inset Quotes erd
\end_inset
to represent code and data struture.
\layout Section
Fundamentals
\layout Standard
A "word" is the main unit of program organization in Factor -- it corresponds
to a "function", "procedure" or "method" in other languages.
\layout Standard
When code examples are given, the input is in a roman font, and any output
from the interpreter is in italics:
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
Hello, world!
\begin_inset Quotes erd
\end_inset
print
\layout LyX-Code
\emph on
Hello, world!
\layout Subsection
The stack
\layout Standard
The stack is used to exchange data between words.
When a number is executed, it is pushed on the stack.
When a word is executed, it receives input parameters by removing successive
elements from the top of the stack.
Results are then pushed back to the top of the stack.
\layout Standard
The word
\family typewriter
.s
\family default
prints the contents of the stack, leaving the contents of the stack unaffected.
The top of the stack is the rightmost element in the printout:
\layout LyX-Code
2 3 .s
\layout LyX-Code
\emph on
{ 2 3 }
\layout Standard
The word
\family typewriter
.
\family default
removes the object at the top of the stack, and prints it:
\layout LyX-Code
1 2 3 .
.
.
\layout LyX-Code
\emph on
3
\layout LyX-Code
\emph on
2
\layout LyX-Code
\emph on
1
\layout Standard
The usual arithmetic operators
\family typewriter
+ - * /
\family default
all take two parameters from the stack, and push one result back.
Where the order of operands matters (
\family typewriter
-
\family default
and
\family typewriter
/
\family default
), the operands are taken from the stack in the natural order.
For example:
\layout LyX-Code
10 17 + .
\layout LyX-Code
\emph on
27
\layout LyX-Code
111 234 - .
\layout LyX-Code
\emph on
-123
\layout LyX-Code
333 3 / .
\layout LyX-Code
\emph on
111
\layout Standard
This type of arithmetic is called
\emph on
postfix
\emph default
, because the operator follows the operands.
Contrast this with
\emph on
infix
\emph default
notation used in many other languages, so-called because the operator is
in-between the two operands.
\layout Standard
More complicated infix expressions can be translated into postfix by translating
the inner-most parts first.
Grouping parantheses are never necessary:
\layout LyX-Code
! Postfix equivalent of (2 + 3) * 6
\layout LyX-Code
2 3 + 6 *
\layout LyX-Code
\emph on
30
\layout LyX-Code
! Postfix equivalent of 2 + (3 * 6)
\layout LyX-Code
2 3 6 * +
\layout LyX-Code
\emph on
20
\layout Subsection
Factoring
\layout Standard
New words can be defined in terms of existing words using the
\emph on
colon definition
\emph default
syntax:
\layout LyX-Code
:
\emph on
name
\emph default
(
\emph on
inputs
\emph default
--
\emph on
outputs
\emph default
)
\layout LyX-Code
#!
\emph on
Description
\layout LyX-Code
\emph on
factors ...
\emph default
;
\layout Standard
When the new word is executed, each one of its factors gets executed, in
turn.
The comment delimited by
\family typewriter
(
\family default
and
\family typewriter
)
\family default
is called a stack effect comment and is described later.
The stack effect comment, as well as the documentation comment starting
with
\family typewriter
#!
\family default
are both optional, and can be placed anywhere in the source code, not just
in colon definitions.
\layout Standard
Note that in a source file, a word definition can span multiple lines.
However, the interactive interpreter expects each line of input to be
\begin_inset Quotes eld
\end_inset
complete
\begin_inset Quotes erd
\end_inset
, so interactively, colon definitions must be entered all on one line.
\layout Standard
For example, lets assume we are designing some software for an aircraft
navigation system.
Lets assume that internally, all lengths are stored in meters, and all
times are stored in seconds.
We can define words for converting from kilometers to meters, and hours
and minutes to seconds:
\layout LyX-Code
: kilometers 1000 * ;
\layout LyX-Code
: minutes 60 * ;
\layout LyX-Code
: hours 60 * 60 * ;
\layout LyX-Code
2 kilometers .
\layout LyX-Code
\emph on
2000
\layout LyX-Code
10 minutes .
\layout LyX-Code
\emph on
600
\layout LyX-Code
2 hours .
\layout LyX-Code
\emph on
7200
\layout Standard
Now, suppose we need a word that takes the flight time, the aircraft velocity,
and the tailwind velocity, and returns the distance travelled.
If the parameters are given on the stack in that order, all we do is add
the top two elements (aircraft velocity, tailwind velocity) and multiply
it by the element underneath (flight time).
So the definition looks like this, this time with a stack effect comment
since its slightly less obvious what the operands are:
\layout LyX-Code
: distance ( time aircraft tailwind -- distance ) + * ;
\layout LyX-Code
2 900 36 distance .
\layout LyX-Code
\emph on
1872
\layout Standard
Note that we are not using any units here.
We could, if we defined some words for velocity units first.
The only non-trivial thing here is the implementation of
\family typewriter
km/hour
\family default
-- we have to divide the
\family typewriter
km/sec
\family default
velocity by the number of seconds in one hour to get the desired result:
\layout LyX-Code
: km/hour kilometers 1 hours / ;
\layout LyX-Code
2 hours 900 km/hour 36 km/hour distance .
\layout LyX-Code
\emph on
1872000
\layout Subsection
Stack effects
\layout Standard
A stack effect comment contains a description of inputs to the left of
\family typewriter
--
\family default
, and a description of outputs to the right.
As always, the top of the stack is on the right side.
Lets try writing a word to compute the cube of a number.
\begin_inset Foot
collapsed false
\layout Standard
I'd use the somewhat simpler example of a word that squares a number, but
such a word already exists in the standard library.
Its in the
\family typewriter
arithmetic
\family default
vocabulary, named
\family typewriter
sq
\family default
.
\end_inset
\layout Standard
Three numbers on the stack can be multiplied together using
\family typewriter
* *
\family default
:
\layout LyX-Code
2 4 8 * * .
\layout LyX-Code
\emph on
64
\layout Standard
However, the stack effect of
\family typewriter
* *
\family default
is
\family typewriter
( a b c -- a*b*c )
\family default
.
We would like to write word that takes
\emph on
one
\emph default
input only.
To achive this, we need to be able to duplicate the top stack element twice.
As it happends, there is a word
\family typewriter
dup ( x -- x x )
\family default
for precisely this purpose.
Now, we are able to define the
\family typewriter
cube
\family default
word:
\layout LyX-Code
: cube dup dup * * ;
\layout LyX-Code
10 cube .
\layout LyX-Code
\emph on
1000
\layout LyX-Code
-2 cube .
\layout LyX-Code
\emph on
-8
\layout Standard
It is quite often the case that we want to compose two factors in a colon
definition, but their stack effects don't
\begin_inset Quotes eld
\end_inset
match up
\begin_inset Quotes erd
\end_inset
.
\layout Standard
There is a set of
\emph on
shuffle words
\emph default
for solving precisely this problem.
These words are so-called because they simply rearrange stack elements
in some fashion, without modifying them in any way.
Lets take a look at the most frequently-used shuffle words:
\layout Standard
\family typewriter
drop ( x -- )
\family default
Discard the top stack element.
Used when a return value is not needed.
\layout Standard
\family typewriter
dup ( x -- x x )
\family default
Duplicate the top stack element.
Used when a value is needed more than once.
\layout Standard
\family typewriter
swap ( x y -- y x )
\family default
Swap top two stack elements.
Used when a word expects parameters in a different order.
\layout Standard
\family typewriter
rot ( x y z -- y z x )
\family default
Rotate top three stack elements to the left.
\layout Standard
\family typewriter
-rot ( x y z -- z x y )
\family default
Rotate top three stack elements to the right.
\layout Standard
\family typewriter
over ( x y -- x y x )
\family default
Bring the second stack element
\begin_inset Quotes eld
\end_inset
over
\begin_inset Quotes erd
\end_inset
the top element.
\layout Standard
\family typewriter
nip ( x y -- y )
\family default
Remove the second stack element.
\layout Standard
\family typewriter
tuck ( x y -- y x y )
\family default
Tuck the top stack element under the second stack element.
\layout Standard
You can try all these words out -- push some numbers on the stack, execute
a word, and look at how the stack contents was changed using
\family typewriter
.s
\family default
.
Compare the stack contents with the stack effects above.
\layout Standard
Note the order of the shuffle word descriptions above.
The ones at the top are used most often because they are easy to understand.
The more complex ones such as rot should be avoided as possible, because
they make the flow of data in a word definition harder to understand.
\layout Standard
If you find yourself using too many shuffle words, or you're writing a stack
effect comment in the middle of a colon definition, it is a good sign that
the word should probably be factored into two or more words.
Effective factoring is like riding a bicycle -- it is hard at first, but
then you
\begin_inset Quotes eld
\end_inset
get it
\begin_inset Quotes erd
\end_inset
, and writing small, clear and reusable word definitions becomes second-nature.
\layout Subsection
Combinators
\layout Standard
A quotation a list of objects that can be executed.
Words that operate on quotations are called
\emph on
combinators
\emph default
.
Quotations are input using the following syntax:
\layout LyX-Code
[ 2 3 + .
]
\layout Standard
When input, a quotation is not executed immediately -- rather, it becomes
one object on the stack.
Try evaluating the following:
\layout LyX-Code
[ 1 2 3 + * ] .s
\layout LyX-Code
\emph on
{ [ 1 2 3 + * ] }
\layout LyX-Code
call .s
\layout LyX-Code
\emph on
{ 5 }
\layout Standard
\family typewriter
call
\family default
\family typewriter
( quot -- )
\family default
executes the quotation at the top of the stack.
Using
\family typewriter
call
\family default
with a literal quotation is useless; writing out the elements of the quotation
has the same effect.
However, the
\family typewriter
call
\family default
combinator is a building block of more powerful combinators, since quotations
can be passed around arbitrarily and even modified before being called.
\layout Standard
\family typewriter
ifte
\family default
\family typewriter
( cond true false -- )
\family default
executes either the
\family typewriter
true
\family default
or
\family typewriter
false
\family default
quotations, depending on the boolean value of
\family typewriter
cond
\family default
.
In Factor, there is no real boolean data type -- instead, a special object
\family typewriter
f
\family default
is the only object with a
\begin_inset Quotes eld
\end_inset
false
\begin_inset Quotes erd
\end_inset
boolean value.
Every other object is a boolean
\begin_inset Quotes eld
\end_inset
true
\begin_inset Quotes erd
\end_inset
.
The special object
\family typewriter
t
\family default
is the
\begin_inset Quotes eld
\end_inset
canonical
\begin_inset Quotes erd
\end_inset
truth value.
\layout Standard
Here is an example of
\family typewriter
ifte
\family default
usage:
\layout LyX-Code
1 2 < [
\begin_inset Quotes eld
\end_inset
1 is less than 2.
\begin_inset Quotes erd
\end_inset
print ] [
\begin_inset Quotes eld
\end_inset
bug!
\begin_inset Quotes erd
\end_inset
print ] ifte
\layout Standard
Compare the order of operands here, and the order of arguments in the stack
effect of
\family typewriter
ifte
\family default
.
\layout Standard
That the stack effects of the two
\family typewriter
ifte
\family default
branches should be the same.
If they differ, the word becomes harder to document and debug.
\layout Standard
\family typewriter
times ( num quot -- )
\family default
executes a quotation a number of times.
It is good style to have the quotation always consume as many values from
the stack as it produces.
This ensures the stack effect of the entire
\family typewriter
times
\family default
expression stays constant regardless of the number of iterations.
\layout Standard
More combinators will be introduced later.
\layout Subsection
Vocabularies
\layout Standard
The dictionary of words is not a flat list -- rather, it is separated into
a number of
\emph on
vocabularies
\emph default
.
Each vocabulary is a named list of words that have something in common
-- for example, the
\begin_inset Quotes eld
\end_inset
lists
\begin_inset Quotes erd
\end_inset
vocabulary contains words for working with linked lists.
\layout Standard
When a word is read by the parser, the
\emph on
vocabulary search path
\emph default
determines which vocabularies to search.
In the interactive interpreter, the default search path contains a large
number of vocabularies.
Contrast this to the situation when a file is being parsed -- the search
path has a minimal set of vocabularies containing basic parsing words.
\begin_inset Foot
collapsed false
\layout Standard
The rationale here is that the interactive interpreter should have a large
number of words available by default, for convinience, whereas source files
should specify their external dependencies explicitly.
\end_inset
\layout Standard
New vocabularies are added to the search path using the
\family typewriter
USE:
\family default
parsing word.
For example:
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
/home/slava/.factor-rc
\begin_inset Quotes erd
\end_inset
exists? .
\layout LyX-Code
\emph on
ERROR: <interactive>:1: Undefined: exists?
\layout LyX-Code
USE: streams
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
/home/slava/.factor-rc
\begin_inset Quotes erd
\end_inset
exists? .
\layout LyX-Code
\emph on
t
\layout Standard
How do you know which vocabulary contains a word? Vocabularies can either
be listed, or an
\begin_inset Quotes eld
\end_inset
apropos
\begin_inset Quotes erd
\end_inset
search can be performed:
\layout LyX-Code
"init" words.
\layout LyX-Code
\emph on
[ ?run-file boot cli-arg cli-param init-environment
\layout LyX-Code
\emph on
init-gc init-interpreter init-scratchpad init-search-path
\layout LyX-Code
\emph on
init-stdio init-toplevel parse-command-line parse-switches
\layout LyX-Code
\emph on
run-files run-user-init stdin stdout ]
\layout LyX-Code
\layout LyX-Code
"map" apropos.
\layout LyX-Code
\emph on
IN: lists
\layout LyX-Code
\emph on
map
\layout LyX-Code
\emph on
IN: strings
\layout LyX-Code
\emph on
str-map
\layout LyX-Code
\emph on
IN: vectors
\layout LyX-Code
\emph on
(vector-map)
\layout LyX-Code
\emph on
(vector-map-step)
\layout LyX-Code
\emph on
vector-map
\layout Standard
New words are defined in the
\emph on
input vocabulary
\emph default
.
The input vocabulary can be changed at the interactive prompt, or in a
source file, using the
\family typewriter
IN:
\family default
parsing word.
For example:
\layout LyX-Code
IN: music-database
\layout LyX-Code
: random-playlist ...
;
\layout Standard
It is a convention (although it is not enforced by the parser) that the
\family typewriter
IN:
\family default
directive is the first statement in a source file, and all
\family typewriter
USE:
\family default
follow, before any other definitions.
\layout Section
PRACTICAL: Numbers game
\layout Standard
In this section, basic input/output and flow control is introduced.
We construct a program that repeatedly prompts the user to guess a number
-- they are informed if their guess is correct, too low, or too high.
The game ends on a correct guess.
\layout LyX-Code
numbers-game
\layout LyX-Code
\emph on
I'm thinking of a number between 0 and 100.
\layout LyX-Code
\emph on
Enter your guess:
\emph default
25
\layout LyX-Code
\emph on
Too low
\layout LyX-Code
\emph on
Enter your guess:
\emph default
38
\layout LyX-Code
\emph on
Too high
\layout LyX-Code
\emph on
Enter your guess:
\emph default
31
\layout LyX-Code
\emph on
Correct - you win!
\layout Subsection
Development methodology
\layout Standard
A typical Factor development session involves a text editor
\begin_inset Foot
collapsed false
\layout Standard
Try jEdit, which has Factor syntax highlighting out of the box.
\end_inset
and Factor interpreter running side by side.
Instead of the edit/compile/run cycle, the development process becomes
an
\begin_inset Quotes eld
\end_inset
edit cycle
\begin_inset Quotes erd
\end_inset
-- you make some changes to the source file and reload it in the interpreter
using a command like this:
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
numbers-game.factor
\begin_inset Quotes erd
\end_inset
run-file
\layout Standard
Then the changes can be tested, either by hand, or using a test harness.
There is no need to compile anything, or to lose interpreter state by restartin
g.
Additionally, words with
\begin_inset Quotes eld
\end_inset
throw-away
\begin_inset Quotes erd
\end_inset
definitions that you do not intend to keep can also be entered directly
at this interpreter prompt.
\layout Standard
Each word should do one useful task.
New words can be defined in terms of existing, already-tested words.
You design a set of reusable words that model the problem domain.
Then, the problem is solved in terms of a
\emph on
domain-specific vocabulary
\emph default
.
This is called
\emph on
bottom-up design.
\layout Subsection
Getting started
\layout Standard
Start a text editor and create a file named
\family typewriter
numbers-game.factor
\family default
.
\layout Standard
At the top of the file, write a comment.
Comments are a feature that can be found in almost any programming language;
in Factor, they are implemented as parsing words.
An example of commenting follows:
\layout LyX-Code
! The word ! discards input until the end of the line
\layout LyX-Code
( The word ( discards input until the next )
\layout Standard
It is always a good idea to comment your code.
Try to write simple code that does not need detailed comments to describe;
similarly, avoid redundant comments.
These two principles are hard to quantify in a concrete way, and will become
more clear as your skills with Factor increase.
\layout Standard
We will be defining new words in the numbers-game vocabulary; add an
\family typewriter
IN:
\family default
statement at the top of the source file:
\layout LyX-Code
IN: numbers-game
\layout Standard
Also in order to be able to test the words, issue a
\family typewriter
USE:
\family default
statement in the interactive interpreter:
\layout LyX-Code
USE: numbers-game
\layout Standard
This section will develop the numbers game in an incremental fashion.
After each addition, issue a command like the following to load the source
file into the Factor interpreter:
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
numbers-game.factor
\begin_inset Quotes erd
\end_inset
run-file
\layout Subsection
Reading a number from the keyboard
\layout Standard
A fundamental operation required for the numbers game is to be able to read
a number from the keyboard.
The
\family typewriter
read
\family default
word
\family typewriter
( -- str )
\family default
reads a line of input and pushes it on the stack as a string.
The
\family typewriter
parse-number
\family default
word
\family typewriter
( str -- n )
\family default
takes a string from the stack, and parses it, pushing an integer.
These two words can be combined into a single colon definition:
\layout LyX-Code
: read-number ( -- n ) read parse-number ;
\layout Standard
You should add this definition to the source file, and try loading the file
into the interpreter.
As you will soon see, this raises an error! The problem is that the two
words
\family typewriter
read
\family default
and
\family typewriter
parse-number
\family default
are not part of the default, minimal, vocabulary search path used when
reading files.
The solution is to use
\family typewriter
apropos.
\family default
to find out which vocabularies contain those words, and add the appropriate
USE: statements to the source file:
\layout LyX-Code
USE: parser
\layout LyX-Code
USE: stdio
\layout Standard
After adding the above two statements, the file should now parse, and testing
should confirm that the read-number word works correctly.
\begin_inset Foot
collapsed false
\layout Standard
There is the possibility of an invalid number being entered at the keyboard.
In this case,
\family typewriter
print-number
\family default
returns
\family typewriter
f
\family default
, the boolean false value.
For the sake of simplicity, we ignore this case in the numbers game example.
However, proper error handling is an essential part of any large program
and is covered later.
\end_inset
\layout Subsection
Printing some messages
\layout Standard
Now we need to make some words for printing various messages.
They are given here without further ado:
\layout LyX-Code
: guess-banner
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
I'm thinking of a number between 0 and 100.
\begin_inset Quotes erd
\end_inset
print ;
\layout LyX-Code
: guess-prompt
\begin_inset Quotes eld
\end_inset
Enter your guess:
\begin_inset Quotes erd
\end_inset
write ;
\layout LyX-Code
: too-high
\begin_inset Quotes eld
\end_inset
Too high
\begin_inset Quotes erd
\end_inset
print ;
\layout LyX-Code
: too-low
\begin_inset Quotes eld
\end_inset
Too low
\begin_inset Quotes erd
\end_inset
print ;
\layout LyX-Code
: correct
\begin_inset Quotes eld
\end_inset
Correct - you win!
\begin_inset Quotes erd
\end_inset
print ;
\layout Standard
Note that in the above, stack effect comments are omitted, since they are
obvious from context.
You should ensure the words work correctly after loading the source file
into the interpreter.
\layout Subsection
Taking action based on a guess
\layout Standard
The next logical step is to write a word
\family typewriter
judge-guess
\family default
that takes the user's guess along with the actual number to be guessed,
and prints one of the messages
\family typewriter
too-high
\family default
,
\family typewriter
too-low
\family default
, or
\family typewriter
correct
\family default
.
This word will also push a boolean flag, indicating if the game should
continue or not -- in the case of a correct guess, the game does not continue.
\layout Standard
This description of judge-guess is a mouthful -- and it suggests that it
may be best to split it into two words.
So the first word we write handles the more specific case of an
\emph on
inexact
\emph default
guess -- so it prints either
\family typewriter
too-low
\family default
or
\family typewriter
too-high
\family default
.
\layout LyX-Code
: inexact-guess ( guess actual -- )
\layout LyX-Code
> [ too-high ] [ too-low ] ifte ;
\layout Standard
Note that the word gives incorrect output if the two parameters are equal.
However, it will never be called this way.
\layout Standard
With this out of the way, the implementation of judge-guess is an easy task
to tackle.
Using the words
\family typewriter
inexact-guess
\family default
,
\family typewriter
=
\family default
, and
\family typewriter
2dup
\family default
, we can write:
\layout LyX-Code
: judge-guess ( actual guess -- ? )
\layout LyX-Code
2dup = [
\layout LyX-Code
correct f
\layout LyX-Code
] [
\layout LyX-Code
inexact-guess t
\layout LyX-Code
] ifte ;
\layout Standard
Note the use of
\family typewriter
2dup ( x y -- x y x y )
\family default
.
Since
\family typewriter
=
\family default
consumes both its parameters, we must make copies of them to pass to
\family typewriter
correct
\family default
and
\family typewriter
inexact-guess
\family default
.
Try the following at the interpreter to see what's going on:
\layout LyX-Code
clear 1 2 2dup = .s
\layout LyX-Code
\emph on
{ 1 2 f }
\layout LyX-Code
clear 4 4 2dup = .s
\layout LyX-Code
\emph on
{ 4 4 t }
\layout Standard
Test
\family typewriter
judge-guess
\family default
with a few inputs:
\layout LyX-Code
1 10 judge-guess .
\layout LyX-Code
\emph on
Too low
\layout LyX-Code
\emph on
t
\layout LyX-Code
89 43 judge-guess .
\layout LyX-Code
\emph on
Too high
\layout LyX-Code
\emph on
t
\layout LyX-Code
64 64 judge-guess .
\layout LyX-Code
\emph on
Correct
\layout LyX-Code
\emph on
f
\layout Subsection
Generating random numbers
\layout Standard
The
\family typewriter
random-int
\family default
word
\family typewriter
( min max -- n )
\family default
pushes a random number in a specified range.
The range is inclusive, so both the minimum and maximum indices are candidate
random numbers.
Use
\family typewriter
apropos.
\family default
to determine that this word is in the
\family typewriter
random
\family default
vocabulary.
For the purposes of this game, random numbers will be in the range of 0
to 100, so we can define a word that generates a random number in the range
of 0 to 100:
\layout LyX-Code
: number-to-guess ( -- n ) 0 100 random-int ;
\layout Standard
Add the word definition to the source file, along with the appropriate
\family typewriter
USE:
\family default
statement.
Load the source file in the interpreter, and confirm that the word functions
correctly, and that its stack effect comment is accurate.
\layout Subsection
The game loop
\layout Standard
The game loop consists of repeated calls to
\family typewriter
guess-prompt
\family default
,
\family typewriter
read-number
\family default
and
\family typewriter
judge-guess
\family default
.
If
\family typewriter
judge-guess
\family default
pushes
\family typewriter
f
\family default
, the loop stops, otherwise it continues.
This is realized with a recursive implementation:
\layout LyX-Code
: numbers-game-loop ( actual -- )
\layout LyX-Code
dup guess-prompt read-number judge-guess [
\layout LyX-Code
numbers-game-loop
\layout LyX-Code
] [
\layout LyX-Code
drop
\layout LyX-Code
] ifte ;
\layout Standard
In Factor, tail-recursive words consume a bounded amount of call stack space.
This means you are free to pick recursion or iteration based on their own
merits when solving a problem.
In many other languages, the usefulness of recursion is severely limited
by the lack of tail-recursive call optimization.
\layout Subsection
Finishing off
\layout Standard
The last task is to combine everything into the main
\family typewriter
numbers-game
\family default
word.
This is easier than it seems:
\layout LyX-Code
: numbers-game number-to-guess numbers-game-loop ;
\layout Standard
Try it out! Simply invoke the numbers-game word in the interpreter.
It should work flawlessly, assuming you tested each component of this design
incrementally!
\layout Subsection
The complete program
\layout LyX-Code
! Numbers game example
\newline
\layout LyX-Code
IN: numbers-game
\layout LyX-Code
USE: parser
\layout LyX-Code
USE: stdio
\newline
\newline
: read-number ( -- n ) read parse-number ;
\newline
\newline
: guess-banner
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
I'm thinking of a number between 0 and 100.
\begin_inset Quotes erd
\end_inset
print ;
\layout LyX-Code
: guess-prompt
\begin_inset Quotes eld
\end_inset
Enter your guess:
\begin_inset Quotes erd
\end_inset
write ;
\layout LyX-Code
: too-high
\begin_inset Quotes eld
\end_inset
Too high
\begin_inset Quotes erd
\end_inset
print ;
\layout LyX-Code
: too-low
\begin_inset Quotes eld
\end_inset
Too low
\begin_inset Quotes erd
\end_inset
print ;
\layout LyX-Code
: correct
\begin_inset Quotes eld
\end_inset
Correct - you win!
\begin_inset Quotes erd
\end_inset
print ;
\newline
\newline
: inexact-guess ( guess actual -- )
\layout LyX-Code
> [ too-high ] [ too-low ] ifte ;
\newline
\newline
: judge-guess ( actual guess -- ? )
\layout LyX-Code
2dup = [
\layout LyX-Code
correct f
\layout LyX-Code
] [
\layout LyX-Code
inexact-guess t
\layout LyX-Code
] ifte ;
\newline
\newline
: number-to-guess ( -- n ) 0 100 random-int ;
\newline
\newline
: numbers-game-loop ( actual -- )
\layout LyX-Code
dup guess-prompt read-number judge-guess [
\layout LyX-Code
numbers-game-loop
\layout LyX-Code
] [
\layout LyX-Code
drop
\layout LyX-Code
] ifte ;
\newline
\newline
: numbers-game number-to-guess numbers-game-loop ;
\layout LyX-Code
\layout Section
Lists
\layout Standard
A list is composed of a set of pairs; each pair holds a list element, and
a reference to the next pair.
Lists have the following literal syntax:
\layout LyX-Code
[
\begin_inset Quotes eld
\end_inset
CEO
\begin_inset Quotes erd
\end_inset
5
\begin_inset Quotes eld
\end_inset
CFO
\begin_inset Quotes erd
\end_inset
-4 f ]
\layout Standard
Before we continue, it is important to understand the role of data types
in Factor.
Lets make a distinction between two categories of data types:
\layout Itemize
Representational type -- this refers to the form of the data in the interpreter.
Representational types include integers, strings, and vectors.
Representational types are checked at run time -- attempting to multiply
two strings, for example, will yield an error.
\layout Itemize
Intentional type -- this refers to the meaning of the data within the problem
domain.
This could be a length measured in inches, or a string naming a file, or
a list of objects in a room in a game.
It is up to the programmer to check intentional types -- Factor won't prevent
you from adding two integers representing a distance and a time, even though
the result is meaningless.
\layout Subsection
Cons cells
\layout Standard
It may surprise you that in Factor,
\emph on
lists are intentional types
\emph default
.
This means that they are not an inherent feature of the interpreter; rather,
they are built from a simpler data type, the
\emph on
cons cell
\emph default
.
\layout Standard
A cons cell is an object that holds a reference to two other objects.
The order of the two objects matters -- the first is called the
\emph on
car
\emph default
, the second is called the
\emph on
cdr
\emph default
.
\layout Standard
All words relating to cons cells and lists are found in the
\family typewriter
lists
\family default
vocabulary.
The words
\family typewriter
cons
\family default
,
\family typewriter
car
\family default
and
\family typewriter
cdr
\family default
\begin_inset Foot
collapsed false
\layout Standard
These infamous names originate from the Lisp language.
Originally,
\begin_inset Quotes eld
\end_inset
Lisp
\begin_inset Quotes erd
\end_inset
stood for
\begin_inset Quotes eld
\end_inset
List Processing
\begin_inset Quotes erd
\end_inset
.
\end_inset
construct and deconstruct cons cells:
\layout LyX-Code
1 2 cons .
\layout LyX-Code
\emph on
[ 1 | 2 ]
\layout LyX-Code
3 4 car .
\layout LyX-Code
\emph on
3
\layout LyX-Code
5 6 cdr .
\layout LyX-Code
\emph on
6
\layout Standard
The output of the first expression suggests a literal syntax for cons cells:
\layout LyX-Code
[ 10 | 20 ] cdr .
\layout LyX-Code
\emph on
20
\layout LyX-Code
[
\begin_inset Quotes eld
\end_inset
first
\begin_inset Quotes erd
\end_inset
| [
\begin_inset Quotes eld
\end_inset
second
\begin_inset Quotes erd
\end_inset
| f ] ] car .
\layout LyX-Code
\emph on
\begin_inset Quotes eld
\end_inset
first
\begin_inset Quotes erd
\end_inset
\layout LyX-Code
[
\begin_inset Quotes eld
\end_inset
first
\begin_inset Quotes erd
\end_inset
| [
\begin_inset Quotes eld
\end_inset
second
\begin_inset Quotes erd
\end_inset
| f ] ] cdr car .
\layout LyX-Code
\emph on
\begin_inset Quotes eld
\end_inset
second
\begin_inset Quotes erd
\end_inset
\layout Standard
The last two examples make it clear how nested cons cells represent a list.
Since this
\begin_inset Quotes eld
\end_inset
nested cons cell
\begin_inset Quotes erd
\end_inset
syntax is extremely cumbersome, the parser provides an easier way:
\layout LyX-Code
[ 1 2 3 4 ] cdr cdr car .
\layout LyX-Code
\emph on
3
\layout Standard
A
\emph on
generalized list
\emph default
is a set of cons cells linked by their cdr.
A
\emph on
proper list
\emph default
, or just list, is a generalized list with a cdr equal to f, the list is
a proper list.
Also, the object
\family typewriter
f
\family default
is a proper list, and in fact it is equivalent to the empty list
\family typewriter
[ ]
\family default
.
An
\emph on
improper list
\emph default
is a generalized list that is not a proper list.
\layout Standard
The
\family typewriter
list?
\family default
word tests if the object at the top of the stack is a proper list:
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
hello
\begin_inset Quotes erd
\end_inset
list? .
\layout LyX-Code
\emph on
f
\layout LyX-Code
[
\begin_inset Quotes eld
\end_inset
first
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
second
\begin_inset Quotes erd
\end_inset
|
\begin_inset Quotes eld
\end_inset
third
\begin_inset Quotes erd
\end_inset
] list? .
\layout LyX-Code
\emph on
f
\layout LyX-Code
[
\begin_inset Quotes eld
\end_inset
first
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
second
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
third
\begin_inset Quotes erd
\end_inset
] list? .
\layout LyX-Code
\emph on
t
\layout Subsection
Working with lists
\layout Standard
Unless otherwise documented, list manipulation words expect proper lists
as arguments.
Given an improper list, they will either raise an error, or disregard the
hanging cdr at the end of the list.
\layout Standard
Also unless otherwise documented, list manipulation words return newly-created
lists only.
The original parameters are not modified.
This may seem inefficient, however the absence of side effects makes code
much easier to test and debug.
\begin_inset Foot
collapsed false
\layout Standard
Side effect-free code is the fundamental idea underlying functional programming
languages.
While Factor allows side effects and is not a functional programming language,
for a lot of problems, coding in a functional style gives the most maintanable
and readable results.
\end_inset
Where performance is important, a set of
\begin_inset Quotes eld
\end_inset
destructive
\begin_inset Quotes erd
\end_inset
words is provided.
They are documented in the next section.
\layout Standard
\family typewriter
add ( list obj -- list )
\family default
Create a new list consisting of the original list, and a new element added
at the end:
\layout LyX-Code
[ 1 2 3 ] 4 add .
\layout LyX-Code
\emph on
[ 1 2 3 4 ]
\layout LyX-Code
1 [ 2 3 4 ] cons .
\layout LyX-Code
\emph on
[ 1 2 3 4 ]
\layout Standard
While
\family typewriter
cons
\family default
and
\family typewriter
add
\family default
appear to have similar effects, they are quite different --
\family typewriter
cons
\family default
is a very cheap operation, while
\family typewriter
add
\family default
has to copy the entire list first! If you need adds to the end to take
a constant time, use a vector.
\layout Standard
\family typewriter
append ( list list -- list )
\family default
Append the two lists at the top of the stack:
\layout LyX-Code
[ 1 2 3 ] [ 4 5 6 ] append .
\layout LyX-Code
\emph on
[ 1 2 3 4 5 6 ]
\layout LyX-Code
[ 1 2 3 ] dup [ 4 5 6 ] append .s
\layout LyX-Code
\emph on
{ [ 1 2 3 ] [ 1 2 3 4 5 6 ] }
\layout Standard
The first list is copied, and the cdr of its last cons cell is set to the
second list.
The second example above shows that the original parameter was not modified.
Interestingly, if the second parameter is not a proper list,
\family typewriter
append
\family default
returns an improper list:
\layout LyX-Code
[ 1 2 3 ] 4 append .
\layout LyX-Code
\emph on
[ 1 2 3 | 4 ]
\layout Standard
\family typewriter
length ( list -- n )
\family default
Iterate down the cdr of the list until it reaches
\family typewriter
f
\family default
, counting the number of elements in the list:
\layout LyX-Code
[ [ 1 2 ] [ 3 4 ] 5 ] length .
\layout LyX-Code
\emph on
3
\layout LyX-Code
[ [ [
\begin_inset Quotes eld
\end_inset
Hey
\begin_inset Quotes erd
\end_inset
] 5 ] length .
\layout LyX-Code
\emph on
2
\layout Standard
\family typewriter
nth ( index list -- obj )
\family default
Look up an element specified by a zero-based index, by successively iterating
down the cdr of the list:
\layout LyX-Code
1 [
\begin_inset Quotes eld
\end_inset
Hamster
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
Bagpipe
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
Beam
\begin_inset Quotes erd
\end_inset
] nth .
\layout LyX-Code
\emph on
\begin_inset Quotes eld
\end_inset
Bagpipe
\begin_inset Quotes erd
\end_inset
\layout Standard
This word takes linear time proportional to the list index.
If you need constant time lookups, use a vector instead.
\layout Standard
\family typewriter
set-nth ( value index list -- list )
\family default
Create a new list, identical to the original list except the element at
the specified index is replaced:
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
Done
\begin_inset Quotes erd
\end_inset
1 [
\begin_inset Quotes eld
\end_inset
Not started
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
Incomplete
\begin_inset Quotes erd
\end_inset
] set-nth .
\layout LyX-Code
\emph on
[
\begin_inset Quotes eld
\end_inset
Done
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
Incomplete
\begin_inset Quotes erd
\end_inset
]
\layout Standard
\family typewriter
remove ( obj list -- list )
\family default
Push a new list, with all occurrences of the object removed.
All other elements are in the same order:
\layout LyX-Code
: australia-
\begin_inset Quotes eld
\end_inset
Australia
\begin_inset Quotes erd
\end_inset
swap remove ;
\layout LyX-Code
[ "Canada" "New Zealand" "Australia" "Russia" ] australia- .
\layout LyX-Code
\emph on
[ "Canada" "New Zealand" "Russia" ]
\layout Standard
\family typewriter
remove-nth ( index list -- list )
\family default
Push a new list, with an index removed:
\layout LyX-Code
: australia-
\begin_inset Quotes eld
\end_inset
Australia
\begin_inset Quotes erd
\end_inset
swap remove ;
\layout LyX-Code
[ "Canada" "New Zealand" "Australia" "Russia" ] australia- .
\layout LyX-Code
\emph on
[ "Canada" "New Zealand" "Russia" ]
\layout Standard
\family typewriter
reverse ( list -- list )
\family default
Push a new list which has the same elements as the original one, but in
reverse order:
\layout LyX-Code
[ 4 3 2 1 ] reverse .
\layout LyX-Code
\emph on
[ 1 2 3 4 ]
\layout Standard
\family typewriter
contains ( obj list -- list )
\family roman
\family default
Look
\family roman
for an occurrence of an object in a list.
The remainder of the list starting from the first occurrence
\family default
is returned.
If the object does not occur in the list, f is returned:
\layout LyX-Code
: lived-in? ( country -- ? )
\layout LyX-Code
[
\begin_inset Quotes eld
\end_inset
Canada
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
New Zealand
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
Australia
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
Russia
\begin_inset Quotes erd
\end_inset
] contains ;
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
Australia
\begin_inset Quotes erd
\end_inset
lived-in? .
\layout LyX-Code
\emph on
[
\begin_inset Quotes eld
\end_inset
Australia
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
Russia
\begin_inset Quotes erd
\end_inset
]
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
Pakistan
\begin_inset Quotes erd
\end_inset
lived-in? .
\layout LyX-Code
\emph on
f
\layout Standard
For now, assume
\begin_inset Quotes eld
\end_inset
occurs
\begin_inset Quotes erd
\end_inset
means
\begin_inset Quotes eld
\end_inset
contains an object that looks like
\begin_inset Quotes erd
\end_inset
.
The issue of object equality is covered in the next chapter.
\layout Standard
\family typewriter
unique ( list -- list )
\family default
Return a new list with all duplicate elements removed.
This word executes in quadratic time, so should not be used with large
lists.
For example:
\layout LyX-Code
[ 1 2 1 4 1 8 ] unique .
\layout LyX-Code
\emph on
[ 1 2 4 8 ]
\layout Standard
\family typewriter
unit ( obj -- list )
\family default
Make a list of one element:
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
Unit 18
\begin_inset Quotes erd
\end_inset
unit .
\layout LyX-Code
\emph on
[
\begin_inset Quotes eld
\end_inset
Unit 18
\begin_inset Quotes erd
\end_inset
]
\layout Subsection
Association lists
\layout Standard
An
\emph on
association list
\emph default
is one where every element is a cons.
The car of each cons is a name, the cdr is a value.
The literal notation is suggestive:
\layout LyX-Code
[
\layout LyX-Code
[
\begin_inset Quotes eld
\end_inset
Jill
\begin_inset Quotes erd
\end_inset
|
\begin_inset Quotes eld
\end_inset
CEO
\begin_inset Quotes erd
\end_inset
]
\layout LyX-Code
[
\begin_inset Quotes eld
\end_inset
Jeff
\begin_inset Quotes erd
\end_inset
|
\begin_inset Quotes eld
\end_inset
manager
\begin_inset Quotes erd
\end_inset
]
\layout LyX-Code
[
\begin_inset Quotes eld
\end_inset
James |
\begin_inset Quotes eld
\end_inset
lowly web designer
\begin_inset Quotes erd
\end_inset
]
\layout LyX-Code
]
\layout Standard
\family typewriter
assoc? ( obj -- ? )
\family default
returns
\family typewriter
t
\family default
if the object is a list whose every element is a cons; otherwise it returns
\family typewriter
f
\family default
.
\layout Standard
\family typewriter
assoc ( name alist -- value )
\family default
looks for a pair with this name in the list, and pushes the cdr of the
pair.
Pushes f if no name with this pair is present.
Note that assoc cannot differentiate between a name that is not present
at all, or a name with a value of
\family typewriter
f
\family default
.
\layout Standard
\family typewriter
assoc* ( name alist -- [ name | value ] )
\family default
looks for a pair with this name, and pushes the pair itself.
Unlike
\family typewriter
assoc
\family default
,
\family typewriter
assoc*
\family default
returns different values in the cases of a value set to
\family typewriter
f
\family default
, or an undefined value.
\layout Standard
\family typewriter
set-assoc ( value name alist -- alist )
\family default
removes any existing occurrence of a name from the list, and adds a new
pair.
This creates a new list, the original is unaffected.
\layout Standard
\family typewriter
acons ( value name alist -- alist )
\family default
is slightly faster than
\family typewriter
set-assoc
\family default
since it simply conses a new pair onto the list.
However, if used repeatedly, the list will grow to contain a lot of
\begin_inset Quotes eld
\end_inset
shadowed
\begin_inset Quotes erd
\end_inset
pairs.
\layout Standard
Searching an association list incurs a linear time cost, so they should
only be used for small mappings -- a typical use is a mapping of half a
dozen entries or so, specified literally in source.
Hashtables can achieve better performance with larger mappings.
\layout Subsection
List combinators
\layout Standard
In a traditional language such as C, every iteration or collection must
be written out as a loop, with setting up and updating of idices, etc.
Factor on the other hand relies on combinators and quotations to avoid
duplicating these loop
\begin_inset Quotes eld
\end_inset
design patterns
\begin_inset Quotes erd
\end_inset
throughout the code.
\layout Standard
The simplest case is iterating through each element of a list, and printing
it or otherwise consuming it from the stack.
\layout Standard
\family typewriter
each ( list quot -- )
\family default
pushes each element of the list in turn, and executes the quotation.
The list and quotation are not on the stack when the quotation is executed.
This allows a powerful idiom where the quotation makes a copy of a value
on the stack, and consumes it along with the list element.
In fact, this idiom works with all well-designed combinators.
\begin_inset Foot
collapsed false
\layout Standard
Later, you will learn how to apply it when designing your own combinators.
\end_inset
\layout Standard
The previously-mentioned
\family typewriter
reverse
\family default
word is implemented using
\family typewriter
each
\family default
:
\layout LyX-Code
: reverse [ ] swap [ swons ] each ;
\layout Standard
To understand how it works, consider that each element of the original list
is consed onto the beginning of a new list, in turn.
So the last element of the original list ends up at the beginning of the
new list.
\layout Standard
\family typewriter
inject ( list quot -- list )
\family default
is similar to
\family typewriter
each
\family default
, except the return values of the quotation are collected into the new list.
The quotation must leave one more element on the stack than was present
before the quotation was called, otherwise the combinator will not function
properly; so the quotation must have stack effect
\family typewriter
( obj -- obj )
\family default
.
\layout Standard
For example, suppose we have a list where each element stores the quantity
of a some nutrient in 100 grams of food; we would like to find out the
total nutrients contained in 300 grams:
\layout LyX-Code
: multiply-each ( n list -- list )
\layout LyX-Code
[ dupd * ] inject nip ;
\layout LyX-Code
3 [ 50 450 101 ] multiply-each .
\layout LyX-Code
\emph on
[ 180 1350 303 ]
\layout Standard
Note the use of
\family typewriter
nip
\family default
to discard the original parameter
\family typewriter
n
\family default
.
\layout Standard
In case there is no appropriate combinator, recursion can be used.
Factor performs tail call optimization, so a word where the recursive call
is the last thing done will not use an arbitrary amount of stack space.
\layout Standard
\family typewriter
subset ( list quot -- list )
\family default
produces a new list containing some of the elements of the original list.
Which elements to collect is determined by the quotation -- the quotation
is called with each list element on the stack in turn, and those elements
for which the quotation does not return
\family typewriter
f
\family default
are added to the new list.
The quotation must have stack effect
\family typewriter
( obj -- ? )
\family default
.
\layout Standard
For example, lets construct a list of all numbers between 0 and 99 such
that the sum of their digits is less than 10:
\layout LyX-Code
: sum-of-digits ( n -- n ) 10 /mod + ;
\layout LyX-Code
100 count [ sum-of-digits 10 < ] subset .
\layout LyX-Code
\emph on
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21
\layout LyX-Code
\emph on
22 23 24 25 26 27 30 31 32 33 34 35 36 40 41 42 43 44
\layout LyX-Code
\emph on
45 50 51 52 53 54 60 61 62 63 70 71 72 80 81 90 ]
\layout Standard
\family typewriter
all? ( list quot -- ? )
\family default
returns
\family typewriter
t
\family default
if the quotation returns
\family typewriter
t
\family default
for all elements of the list, otherwise it returns
\family typewriter
f
\family default
.
In other words, if
\family typewriter
all?
\family default
returns
\family typewriter
t
\family default
, then
\family typewriter
subset
\family default
applied to the same list and quotation would return the entire list.
\begin_inset Foot
collapsed false
\layout Standard
Barring any side effects which modify the execution of the quotation.
It is best to avoid side effects when using list combinators.
\end_inset
\layout Standard
For example, the implementation of
\family typewriter
assoc?
\family default
uses
\family typewriter
all?
\family default
:
\layout LyX-Code
: assoc? ( list -- ? )
\layout LyX-Code
dup list? [ [ cons? ] all? ] [ drop f ] ifte ;
\layout Subsection
List constructors
\layout Standard
The list construction words minimize stack noise with a clever trick.
They store a partial list in a variable, thus reducing the number of stack
elements that have to be juggled.
\layout Standard
The word
\family typewriter
[, ( -- )
\family default
begins list construction.
\layout Standard
The word
\family typewriter
, ( obj -- )
\family default
appends an object to the partial list.
\layout Standard
The word
\family typewriter
,] ( -- list )
\family default
pushes the complete list.
\layout Standard
While variables haven't been described yet, keep in mind that a new scope
is created between
\family typewriter
[,
\family default
and
\family typewriter
,]
\family default
.
This means that list constructions can be nested, as long as in the end,
the number of
\family typewriter
[,
\family default
and
\family typewriter
,]
\family default
balances out.
There is no requirement that
\family typewriter
[,
\family default
and
\family typewriter
,]
\family default
appear in the same word, however, debugging becomes prohibitively difficult
when a list construction begins in one word and ends with another.
\layout Standard
Here is an example of list construction using this technique:
\layout LyX-Code
[, 1 10 [ 2 * dup , ] times drop ,] .
\layout LyX-Code
\emph on
[ 2 4 8 16 32 64 128 256 512 1024 ]
\layout LyX-Code
\layout Subsection
Destructively modifying lists
\layout Standard
All previously discussed list modification functions always returned newly-alloc
ated lists.
Destructive list manipulation functions on the other hand reuse the cons
cells of their input lists, and hence avoid memory allocation.
\layout Standard
Only ever destructively change lists you do not intend to reuse again.
You should not rely on the side effects -- they are unpredictable.
It is wrong to think that destructive words
\begin_inset Quotes eld
\end_inset
modify
\begin_inset Quotes erd
\end_inset
the original list -- rather, think of them as returning a new list, just
like the normal versions of the words, with the added caveat that the original
list must not be used again.
\layout Standard
\family typewriter
nreverse ( list -- list )
\family default
reverses a list without consing.
In the following example, the return value reuses the cons cells of the
original list, and the original list has been ruined by unpredictable side
effects:
\layout LyX-Code
[ 1 2 3 4 ] dup nreverse .s
\layout LyX-Code
\emph on
{ [ 4 ] [ 4 3 2 1 ] }
\layout Standard
Compare the second stack element (which is what remains of the original
list) and the top stack element (the list returned by
\family typewriter
nreverse
\family default
).
\layout Standard
The
\family typewriter
nreverse
\family default
word is the most frequently used destructive list manipulator.
The usual idiom is a loop where values are consed onto the beginning of
a list in each iteration of a loop, then the list is reversed at the end.
Since the original list is never used again,
\family typewriter
nreverse
\family default
can safely be used here.
\layout Standard
\family typewriter
nappend ( list list -- list )
\family default
sets the cdr of the last cons cell in the first list to the second list,
unless the first list is
\family typewriter
f
\family default
, in which case it simply returns the second list.
Again, the side effects on the first list are unpredictable -- if it is
\family typewriter
f
\family default
, it is unchanged, otherwise, it is equal to the return value:
\layout LyX-Code
[ 1 2 ] [ 3 4 ] nappend .
\layout LyX-Code
\emph on
[ 1 2 3 4 ]
\layout Standard
Note in the above examples, we use literal list parameters to nreverse and
nappend.
This is actually a very bad idea, since the same literal list may be used
more than once! For example, lets make a colon definition:
\layout LyX-Code
: very-bad-idea [ 1 2 3 4 ] nreverse ;
\layout LyX-Code
very-bad-idea .
\layout LyX-Code
\emph on
[ 4 3 2 1 ]
\layout LyX-Code
very-bad-idea .
\layout LyX-Code
\emph on
[ 4 ]
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
very-bad-idea
\begin_inset Quotes erd
\end_inset
see
\layout LyX-Code
\emph on
: very-bad-idea
\layout LyX-Code
\emph on
[ 4 ] nreverse ;
\layout Standard
As you can see, the word definition itself was ruined!
\layout Standard
Sometimes it is desirable make a copy of a list, so that the copy may be
safely side-effected later.
\layout Standard
\family typewriter
clone-list ( list -- list )
\family default
pushes a new list containing the exact same elements as the original.
The elements themselves are not copied.
\layout Standard
If you want to write your own destructive list manipulation words, you can
use
\family typewriter
set-car ( value cons -- )
\family default
and
\family typewriter
set-cdr ( value cons -- )
\family default
to modify individual cons cells.
Some words that are not destructive on their inputs nonetheless create
intermediate lists which are operated on using these words.
One example is
\family typewriter
clone-list
\family default
itself.
\layout Section
Vectors
\layout Standard
A vector is a contiguous chunk of cells which hold references to arbitrary
objects.
Vectors have the following literal syntax:
\layout LyX-Code
{ f f f t t f t t -6
\begin_inset Quotes eld
\end_inset
Hey
\begin_inset Quotes erd
\end_inset
}
\layout Standard
Use of vector literals in source code is discouraged, since vector manipulation
relies on side effects rather than return values, and it is very easy to
mess up a literal embedded in a word definition.
\layout Subsection
Vectors versus lists
\layout Standard
Vectors are applicable for a different class of problems than lists.
Compare the relative performance of common operations on vectors and lists:
\layout Standard
\begin_inset Tabular
<lyxtabular version="3" rows="4" columns="3">
<features>
<column alignment="center" valignment="top" leftline="true" width="0">
<column alignment="center" valignment="top" leftline="true" width="0">
<column alignment="center" valignment="top" leftline="true" rightline="true" width="0">
<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
Lists
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
Vectors
\end_inset
</cell>
</row>
<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
Random access of an index
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
linear time
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
constant time
\end_inset
</cell>
</row>
<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
Add new element at start
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
constant time
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
linear time
\end_inset
</cell>
</row>
<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
Add new element at end
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
linear time
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
constant time
\end_inset
</cell>
</row>
</lyxtabular>
\end_inset
\layout Standard
When using vectors, you need to pass around a vector and an index -- when
working with lists, often only a list head is passed around.
For this reason, if you need a sequence for iteration only, a list is a
better choice because the list vocabulary contains a rich collection of
recursive words.
\layout Standard
On the other hand, when you need to maintain your own
\begin_inset Quotes eld
\end_inset
stack
\begin_inset Quotes erd
\end_inset
-like collection, a vector is the obvious choice, since most pushes and
pops can then avoid allocating memory.
\layout Standard
Vectors and lists can be converted back and forth using the
\family typewriter
vector>list
\family default
word
\family typewriter
( vector -- list )
\family default
and the
\family typewriter
list>vector
\family default
word
\family typewriter
( list -- vector )
\family default
.
\layout Subsection
Vector manipulation
\layout Standard
\family typewriter
<vector> ( capacity -- vector )
\family default
pushes a zero-length vector.
Storing more elements than the initial capacity grows the vector.
\layout Standard
\family typewriter
vector-nth ( index vector -- obj )
\family default
pushes the object stored at a zero-based index of a vector:
\layout LyX-Code
0 {
\begin_inset Quotes eld
\end_inset
zero
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
one
\begin_inset Quotes erd
\end_inset
} vector-nth .
\layout LyX-Code
\emph on
\begin_inset Quotes eld
\end_inset
zero
\begin_inset Quotes erd
\end_inset
\layout LyX-Code
2 { 1 2 } vector-nth .
\layout LyX-Code
\emph on
ERROR: Out of bounds
\layout Standard
\family typewriter
set-vector-nth ( obj index vector -- )
\family default
stores a value into a vector:
\begin_inset Foot
collapsed false
\layout Standard
The words
\family typewriter
get
\family default
and
\family typewriter
set
\family default
used in this example will be formally introduced later.
\end_inset
\layout LyX-Code
{
\begin_inset Quotes eld
\end_inset
math
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
CS
\begin_inset Quotes erd
\end_inset
}
\begin_inset Quotes eld
\end_inset
v
\begin_inset Quotes erd
\end_inset
set
\layout LyX-Code
1
\begin_inset Quotes eld
\end_inset
philosophy
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
v
\begin_inset Quotes erd
\end_inset
get set-vector-nth
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
v
\begin_inset Quotes erd
\end_inset
get .
\layout LyX-Code
\emph on
{
\begin_inset Quotes eld
\end_inset
math
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
philosophy
\begin_inset Quotes erd
\end_inset
}
\layout LyX-Code
4
\begin_inset Quotes eld
\end_inset
CS
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
v
\begin_inset Quotes erd
\end_inset
get set-vector-nth
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
v
\begin_inset Quotes erd
\end_inset
get .
\layout LyX-Code
\emph on
{
\begin_inset Quotes eld
\end_inset
math
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
philosophy
\begin_inset Quotes erd
\end_inset
f f
\begin_inset Quotes eld
\end_inset
CS
\begin_inset Quotes erd
\end_inset
}
\layout Standard
\family typewriter
vector-length ( vector -- length )
\family default
pushes the number of elements in a vector.
As the previous two examples demonstrate, attempting to fetch beyond the
end of the vector will raise an error, while storing beyond the end will
grow the vector as necessary.
\layout Standard
\family typewriter
set-vector-length ( length vector -- )
\family default
resizes a vector.
If the new length is larger than the current length, the vector grows if
necessary, and the new cells are filled with
\family typewriter
f
\family default
.
\layout Standard
\family typewriter
vector-push ( obj vector -- )
\family default
adds an object at the end of the vector.
This increments the vector's length by one.
\layout Standard
\family typewriter
vector-pop ( vector -- obj )
\family default
removes the object at the end of the vector and pushes it.
This decrements the vector's length by one.
\layout Subsection
Vector combinators
\layout Standard
vector-each, vector-map
\layout Section
Strings
\layout Subsection
Strings are character vectors
\layout Standard
str-nth, str-length, substring, ...
\layout Subsection
String buffers are mutable
\layout Standard
<sbuf>, sbuf-append, sbuf>str
\layout Standard
Otherwise like a vector:
\layout Standard
sbuf-nth, set-sbuf-nth, sbuf-length, set-sbuf-length
\layout Subsection
String constructors
\layout Standard
<% % %>
\layout Subsection
Printing and reading strings
\layout Standard
print, write, read, turning a string into a number
\layout Section
PRACTICAL: Contractor timesheet
\layout Subsection
Adding a timesheet entry
\layout Standard
When you begin working on a new task, you tell the timesheet you want to
add a new entry.
It then measures the elapsed time until you specify the task is done, and
prompts for a task description.
\layout Standard
The first word we will write is
\family typewriter
measure-duration
\family default
.
We measure the time duration by using the
\family typewriter
millis
\family default
word
\family typewriter
( -- m )
\family default
to take the time before and after a call to
\family typewriter
read
\family default
.
The
\family typewriter
millis
\family default
word pushes the number of milliseconds since a certain epoch -- the epoch
does not matter here since we are only interested in the difference between
two times.
\layout Standard
A first attempt at
\family typewriter
measure-duration
\family default
might look like this:
\layout LyX-Code
: measure-duration millis read drop millis - ;
\layout LyX-Code
measure-duration .
\layout Standard
This word definition has the right general idea, however, the result is
negative.
Also, we would like to measure durations in minutes, not milliseconds:
\layout LyX-Code
: measure-duration ( -- duration )
\layout LyX-Code
millis
\layout LyX-Code
read drop
\layout LyX-Code
millis swap - 1000 /i 60 /i ;
\layout Standard
Note that the
\family typewriter
/i
\family default
word
\family typewriter
( x y -- x/y )
\family default
, from the
\family typewriter
arithmetic
\family default
vocabulary, performs truncating division.
This makes sense, since we are not interested in fractional parts of a
minute here.
\layout Standard
Now that we can measure a time duration at the keyboard, lets write the
\family typewriter
add-entry-prompt
\family default
word.
This word does exactly what one would expect -- it prompts for the time
duration and description, and leaves those two values on the stack:
\layout LyX-Code
: add-entry-prompt ( -- duration description )
\layout LyX-Code
"Start work on the task now.
Press ENTER when done." print
\layout LyX-Code
measure-duration
\layout LyX-Code
"Please enter a description:" print
\layout LyX-Code
read ;
\layout Standard
You should interactively test this word.
Measure off a minute or two, press ENTER, enter a description, and press
ENTER again.
The stack should now contain two values, in the same order as the stack
effect comment.
\layout Standard
Now, almost all the ingredients are in place.
The final add-entry word calls add-entry-prompt, then pushes the new entry
on the end of the timesheet vector:
\layout LyX-Code
: add-entry ( timesheet -- )
\layout LyX-Code
add-entry-prompt cons swap vector-push ;
\layout Standard
Recall that timesheet entries are cons cells where the car is the duration
and the cdr is the description, hence the call to
\family typewriter
cons
\family default
.
Note that this word side-effects the timesheet vector.
You can test it interactively like so:
\layout LyX-Code
10 <vector> dup add-entry
\layout LyX-Code
\emph on
Start work on the task now.
Press ENTER when done.
\layout LyX-Code
\emph on
Please enter a description:
\layout LyX-Code
\emph on
Studying Factor
\layout LyX-Code
.
\layout LyX-Code
\emph on
{ [ 2 |
\begin_inset Quotes eld
\end_inset
Studying Factor
\begin_inset Quotes erd
\end_inset
] }
\layout Subsection
Printing the timesheet
\layout Standard
The hard part of printing the timesheet is turning the duration in minutes
into a nice hours/minutes string, like
\begin_inset Quotes eld
\end_inset
01:15
\begin_inset Quotes erd
\end_inset
.
We would like to make a word like the following:
\layout LyX-Code
135 hh:mm .
\layout LyX-Code
\emph on
01:15
\layout Standard
First, we can make a pair of words hh and mm to extract the hours and minutes,
respectively.
This can be achieved using truncating division, and the modulo operator
-- also, since we would like strings to be returned, the
\family typewriter
unparse
\family default
word
\family typewriter
( obj -- str )
\family default
from the
\family typewriter
unparser
\family default
vocabulary is called to turn the integers into strings:
\layout LyX-Code
: hh ( duration -- str ) 60 /i unparse ;
\layout LyX-Code
: mm ( duration -- str ) 60 mod unparse ;
\layout Standard
The
\family typewriter
hh:mm
\family default
word can then be written, concatenating the return values of
\family typewriter
hh
\family default
and
\family typewriter
mm
\family default
into a single string using string construction:
\layout LyX-Code
: hh:mm ( millis -- str ) <% dup hh % ":" % mm % %> ;
\layout Standard
However, so far, these three definitions do not produce ideal output.
Try a few examples:
\layout LyX-Code
120 hh:mm .
\layout LyX-Code
2:0
\layout LyX-Code
130 hh:mm .
\layout LyX-Code
2:10
\layout Standard
Obviously, we would like the minutes to always be two digits.
Luckily, there is a
\family typewriter
digits
\family default
word
\family typewriter
( str n -- str )
\family default
in the
\family typewriter
format
\family default
vocabulary that adds enough zeros on the left of the string to give it
the specified length.
Try it out:
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
23
\begin_inset Quotes erd
\end_inset
2 digits .
\layout LyX-Code
\emph on
\begin_inset Quotes eld
\end_inset
23
\begin_inset Quotes erd
\end_inset
\layout LyX-Code
\begin_inset Quotes eld
\end_inset
7
\begin_inset Quotes erd
\end_inset
2 digits .
\layout LyX-Code
\emph on
\begin_inset Quotes eld
\end_inset
07
\begin_inset Quotes erd
\end_inset
\layout Standard
We can now change the definition of
\family typewriter
mm
\family default
accordingly:
\layout LyX-Code
: mm ( duration -- str ) 60 mod unparse 2 digits ;
\layout Standard
Now that time duration output is done, a first attempt at a definition of
\family typewriter
print-timesheet
\family default
looks like this:
\layout LyX-Code
: print-timesheet ( timesheet -- )
\layout LyX-Code
[ uncons write
\begin_inset Quotes eld
\end_inset
:
\begin_inset Quotes eld
\end_inset
write hh:mm print ] vector-each ;
\layout Standard
This works, but produces ugly output:
\layout LyX-Code
{ [ 30 |
\begin_inset Quotes eld
\end_inset
Studying Factor
\begin_inset Quotes erd
\end_inset
] [ 65 |
\begin_inset Quotes eld
\end_inset
Paperwork
\begin_inset Quotes erd
\end_inset
] }
\layout LyX-Code
print-timesheet
\layout LyX-Code
\emph on
Studying Factor: 0:30
\layout LyX-Code
\emph on
Paperwork: 1:05
\layout Standard
It would be much nicer if the time durations lined up in the same column.
First, lets factor out the body of the
\family typewriter
vector-each
\family default
loop into a new
\family typewriter
print-entry
\family default
word before it gets too long:
\layout LyX-Code
: print-entry ( duration description -- )
\layout LyX-Code
write
\begin_inset Quotes eld
\end_inset
:
\begin_inset Quotes erd
\end_inset
write hh:mm print ;
\newline
\newline
: print-timesheet ( timesheet -- )
\layout LyX-Code
[ uncons print-entry ] vector-each ;
\layout Standard
We can now make
\family typewriter
print-entry
\family default
line up columns using the
\family typewriter
pad-string
\family default
word
\family typewriter
( str n -- str )
\family default
.
\layout LyX-Code
: print-entry ( duration description -- )
\layout LyX-Code
dup
\layout LyX-Code
write
\layout LyX-Code
50 swap pad-string write
\layout LyX-Code
hh:mm print ;
\layout Standard
In the above definition, we first print the description, then enough blanks
to move the cursor to column 60.
So the description text is left-justified.
If we had interchanged the order of the second and third line in the definition
, the description text would be right-justified.
\layout Standard
Try out
\family typewriter
print-timesheet
\family default
again, and marvel at the aligned columns:
\layout LyX-Code
{ [ 30 |
\begin_inset Quotes eld
\end_inset
Studying Factor
\begin_inset Quotes erd
\end_inset
] [ 65 |
\begin_inset Quotes eld
\end_inset
Paperwork
\begin_inset Quotes erd
\end_inset
] }
\layout LyX-Code
print-timesheet
\layout LyX-Code
\emph on
Studying Factor 0:30
\layout LyX-Code
\emph on
Paperwork 1:05
\layout Subsection
The main menu
\layout Standard
Reading a number, showing a menu
\layout Section
Variables and namespaces
\layout Subsection
Hashtables
\layout Subsection
Namespaces
\layout Subsection
The name stack
\layout Subsection
The inspector
\layout Section
PRACTICAL: Music player
\layout Section
Deeper in the beast
\layout Standard
Text -> objects - parser, objects -> text - unparser for atoms, prettyprinter
for collections.
\layout Standard
What really is a word -- primitive, parameter, property list.
\layout Standard
Call stack how it works and >r/r>
\layout Subsection
Parsing words
\layout Standard
Lets take a closer look at Factor syntax.
Consider a simple expression, and the result of evaluating it in the interactiv
e interpreter:
\layout LyX-Code
2 3 + .
\layout LyX-Code
\emph on
5
\layout Standard
The interactive interpreter is basically an infinite loop.
It reads a line of input from the terminal, parses this line to produce
a
\emph on
quotation
\emph default
, and executes the quotation.
\layout Standard
In the parse step, the input text is tokenized into a sequence of white
space-separated tokens.
First, the interpreter checks if there is an existing word named by the
token.
If there is no such word, the interpreter instead treats the token as a
number.
\begin_inset Foot
collapsed false
\layout Standard
Of course, Factor supports a full range of data types, including strings,
lists and vectors.
Their source representations are still built from numbers and words, however.
\end_inset
\layout Standard
Once the expression has been entirely parsed, the interactive interpreter
executes it.
\layout Standard
This parse time/run time distinction is important, because words fall into
two categories;
\begin_inset Quotes eld
\end_inset
parsing words
\begin_inset Quotes erd
\end_inset
and
\begin_inset Quotes eld
\end_inset
running words
\begin_inset Quotes erd
\end_inset
.
\layout Standard
The parser constructs a parse tree from the input text.
When the parser encounters a token representing a number or an ordinary
word, the token is simply appended to the current parse tree node.
A parsing word on the other hand is executed
\emph on
\emph default
immediately after being tokenized.
Since it executes in the context of the parser, it has access to the raw
input text, the entire parse tree, and other parser structures.
\layout Standard
Parsing words are also defined using colon definitions, except we add
\family typewriter
parsing
\family default
after the terminating
\family typewriter
;
\family default
.
Here are two examples of definitions for words
\family typewriter
foo
\family default
and
\family typewriter
bar
\family default
, both are identical except in the second example,
\family typewriter
foo
\family default
is defined as a parsing word:
\layout LyX-Code
! Lets define 'foo' as a running word.
\layout LyX-Code
: foo
\begin_inset Quotes eld
\end_inset
1) foo executed.
\begin_inset Quotes erd
\end_inset
print ;
\layout LyX-Code
: bar foo
\begin_inset Quotes eld
\end_inset
2) bar executed.
\begin_inset Quotes erd
\end_inset
;
\layout LyX-Code
bar
\layout LyX-Code
\emph on
1) foo executed
\layout LyX-Code
\emph on
2) bar executed
\layout LyX-Code
bar
\layout LyX-Code
\emph on
1) foo executed
\layout LyX-Code
\emph on
2) bar executed
\layout LyX-Code
\layout LyX-Code
! Now lets define 'foo' as a parsing word.
\layout LyX-Code
: foo
\begin_inset Quotes eld
\end_inset
1) foo executed.
\begin_inset Quotes erd
\end_inset
print ; parsing
\layout LyX-Code
: bar foo
\begin_inset Quotes eld
\end_inset
2) bar executed.
\begin_inset Quotes erd
\end_inset
;
\layout LyX-Code
\emph on
1) foo executed
\layout LyX-Code
bar
\layout LyX-Code
\emph on
2) bar executed
\layout LyX-Code
bar
\layout LyX-Code
\emph on
2) bar executed
\layout Standard
In fact, the word
\family typewriter
\begin_inset Quotes eld
\end_inset
\family default
that denotes a string literal is a parsing word -- it reads characters from
the input text until the next occurrence of
\family typewriter
\begin_inset Quotes eld
\end_inset
\family default
, and appends this string to the current node of the parse tree.
Note that strings and words are different types of objects.
Strings are covered in great detail later.
\layout Section
PRACTICAL: Infix syntax
\layout Section
Continuations
\layout Standard
Generators, co-routines, multitasking, exception handling
\layout Section
HTTP Server
\layout Section
PRACTICAL: Some web app
\the_end