compiler work

cvs
Slava Pestov 2004-09-11 19:26:24 +00:00
parent c02755227e
commit 34041bedbf
16 changed files with 117 additions and 65 deletions

View File

@ -3,7 +3,7 @@ CC = gcc
# On PowerPC G5:
# CFLAGS = -mcpu=970 -mtune=970 -mpowerpc64 -ffast-math -O3
# On Pentium 4:
# CFLAGS = -march=pentium4 -ffast-math -O3
# CFLAGS = -march=pentium4 -ffast-math -O3 -fomit-frame-pointer
# Add -fomit-frame-pointer if you don't care about debugging
CFLAGS = -Os -g -Wall

View File

@ -409,6 +409,7 @@ is pushed on the stack. Try evaluating the following:
call .s
\emph{\{ 5 \}}
\end{alltt}
\texttt{call} \texttt{( quot -{}- )} executes the quotation at the
top of the stack. Using \texttt{call} with a literal quotation is
useless; writing out the elements of the quotation has the same effect.
@ -442,10 +443,6 @@ More combinators will be introduced in later sections.
\subsection{Recursion}
The idea of \emph{recursion} is key to understanding Factor. A \emph{recursive} word definition is one that refers to itself, usually in one branch of a conditional.
FIXME
\section{Numbers}
Factor provides a rich set of math words. Factor numbers more closely model the mathematical concept of a number than other languages. Where possible, exact answers are given -- for example, adding or multiplying two integers never results in overflow, and dividing two integers yields a fraction rather than a truncated result. Complex numbers are supported, allowing many functions to be computed with parameters that would raise errors or return ``not a number'' in other languages.
@ -2400,11 +2397,28 @@ The name stack is really just a vector. The words \texttt{>n} and \texttt{n>} ar
: n> ( n:namespace -- namespace ) namestack* vector-pop ;
\end{alltt}
\section{Metaprogramming}
\section{The execution model in depth}
Recall that code quotations are in fact just linked lists. Factor code is data, and vice versa. Essentially, the interpreter iterates through code quotations, pushing literals and executing words. When a word is executed, one of two things happen -- either the word has a colon definition, and the interpreter is invoked recursively on the definition, or the word is primitive, and it is executed by the underlying virtual machine. A word is itself a first-class object.
\subsection{Recursion}
It is the job of the parser to transform source code denoting literals and words into their internal representations. This is done using a vocabulary of \emph{parsing words}. The prettyprinter does the converse, by printing out data structures in a parsable form (both to humans and Factor). Because code is data, text representation of source code doubles as a way to serialize almost any Factor object.
The idea of \emph{recursion} is key to understanding Factor. A \emph{recursive} word definition is one that refers to itself, usually in one branch of a conditional.
tail recursion
preserving values between iterations
ensuring a consistent stack effect
works well with lists, since only the head is passed
not so well with vectors and strings -- need an obj+index
\subsection{Combinators}
a combinator is a recursive word that takes quotations
how to ensure a consistent stack view for the quotations
\subsection{Looking at words}
@ -2452,24 +2466,6 @@ If the primitive number is set to 1, the word is a colon definition and the para
The word \texttt{define ( word quot -{}- )} defines a word to have the specified colon definition. Note that \texttt{create} and \texttt{define} perform an action somewhat analagous to the \texttt{: ... ;} notation for colon definitions, except at parse time rather than run time.
\subsection{The prettyprinter}
We've already seen the word \texttt{.} which prints the top of the stack in a form that may be read back in. The word \texttt{prettyprint} is similar, except the output is in an indented, multiple-line format. Both words are in the \texttt{prettyprint} vocabulary. Here is an example:
\begin{alltt}
{[} 1 {[} 2 3 4 {]} 5 {]} .
\emph{{[} 1 {[} 2 3 4 {]} 5 {]}}
{[} 1 {[} 2 3 4 {]} 5 {]} prettyprint
\emph{{[}
1 {[}
2 3 4
{]} 5
{]}}
\end{alltt}
\subsection{The parser}
\subsection{Parsing words}
Lets take a closer look at Factor syntax. Consider a simple expression,
@ -2538,6 +2534,29 @@ next occurrence of \texttt{{}''}, and appends this string to the
current node of the parse tree. Note that strings and words are different
types of objects. Strings are covered in great detail later.
\section{NOT DONE}
Recall that code quotations are in fact just linked lists. Factor code is data, and vice versa. Essentially, the interpreter iterates through code quotations, pushing literals and executing words. When a word is executed, one of two things happen -- either the word has a colon definition, and the interpreter is invoked recursively on the definition, or the word is primitive, and it is executed by the underlying virtual machine. A word is itself a first-class object.
It is the job of the parser to transform source code denoting literals and words into their internal representations. This is done using a vocabulary of \emph{parsing words}. The prettyprinter does the converse, by printing out data structures in a parsable form (both to humans and Factor). Because code is data, text representation of source code doubles as a way to serialize almost any Factor object.
\subsection{The prettyprinter}
We've already seen the word \texttt{.} which prints the top of the stack in a form that may be read back in. The word \texttt{prettyprint} is similar, except the output is in an indented, multiple-line format. Both words are in the \texttt{prettyprint} vocabulary. Here is an example:
\begin{alltt}
{[} 1 {[} 2 3 4 {]} 5 {]} .
\emph{{[} 1 {[} 2 3 4 {]} 5 {]}}
{[} 1 {[} 2 3 4 {]} 5 {]} prettyprint
\emph{{[}
1 {[}
2 3 4
{]} 5
{]}}
\end{alltt}
\subsection{Profiling}
\section{PRACTICAL: Infix syntax}

View File

@ -35,7 +35,7 @@ import java.io.*;
public class FactorInterpreter implements FactorObject, Runnable
{
public static final String VERSION = "0.65";
public static final String VERSION = "0.66";
public static final Cons DEFAULT_USE = new Cons("builtins",
new Cons("syntax",new Cons("scratchpad",null)));

View File

@ -26,6 +26,7 @@
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: compiler
USE: combinators
USE: math
USE: kernel
USE: stack
@ -36,6 +37,13 @@ USE: stack
: init-assembler ( -- )
compiled-offset literal-table + set-compiled-offset ;
: compile-aligned ( n -- )
dup compiled-offset mod dup 0 = [
2drop
] [
- compiled-offset + set-compiled-offset
] ifte ;
: intern-literal ( obj -- lit# )
address-of
literal-top set-compiled-cell

View File

@ -41,6 +41,9 @@ USE: combinators
: ESI 6 ;
: EDI 7 ;
: MOD-R/M ( r/m reg/opcode mod -- )
6 shift swap 3 shift bitor bitor compile-byte ;
: PUSH ( reg -- )
HEX: 50 + compile-byte ;
@ -57,7 +60,7 @@ USE: combinators
drop HEX: a1 compile-byte
] [
HEX: 8b compile-byte
3 shift BIN: 101 bitor compile-byte
BIN: 101 swap 0 MOD-R/M
] ifte compile-cell ;
: I>[R] ( imm reg -- )
@ -71,21 +74,21 @@ USE: combinators
nip HEX: a3 compile-byte
] [
HEX: 89 compile-byte
swap 3 shift BIN: 101 bitor compile-byte
swap BIN: 101 swap 0 MOD-R/M
] ifte compile-cell ;
: [R]>R ( reg reg -- )
#! MOV INDIRECT <reg> TO <reg>.
HEX: 8b compile-byte swap 3 shift bitor compile-byte ;
HEX: 8b compile-byte swap 0 MOD-R/M ;
: R>[R] ( reg reg -- )
#! MOV <reg> TO INDIRECT <reg>.
HEX: 89 compile-byte swap 3 shift bitor compile-byte ;
HEX: 89 compile-byte swap 0 MOD-R/M ;
: I+[I] ( imm addr -- )
#! ADD <imm> TO ADDRESS <addr>
HEX: 81 compile-byte
HEX: 05 compile-byte
BIN: 101 0 0 MOD-R/M
compile-cell
compile-cell ;
@ -93,14 +96,14 @@ USE: combinators
#! SUBTRACT <imm> FROM <reg>, STORE RESULT IN <reg>
over -128 127 between? [
HEX: 83 compile-byte
HEX: e8 + compile-byte
BIN: 101 BIN: 11 MOD-R/M
compile-byte
] [
dup EAX = [
drop HEX: 2d compile-byte
] [
HEX: 81 compile-byte
BIN: 11101000 bitor
BIN: 101 BIN: 11 MOD-R/M
] ifte
compile-cell
] ifte ;
@ -111,11 +114,11 @@ USE: combinators
#! 81 38 33 33 33 00 cmpl $0x333333,(%eax)
over -128 127 between? [
HEX: 83 compile-byte
HEX: 38 + compile-byte
BIN: 111 0 MOD-R/M
compile-byte
] [
HEX: 81 compile-byte
HEX: 38 + compile-byte
BIN: 111 0 MOD-R/M
compile-cell
] ifte ;
@ -127,8 +130,8 @@ USE: combinators
4 DATASTACK I+[I] ;
: [LITERAL] ( cell -- )
#! Push literal on data stack by following an indirect
#! pointer.
#! Push complex literal on data stack by following an
#! indirect pointer.
ECX PUSH
( cell -- ) ECX [I]>R
DATASTACK EAX [I]>R

View File

@ -132,6 +132,7 @@ USE: words
] with-scope ;
: begin-compiling ( word -- )
cell compile-aligned
compiled-offset "compiled-xt" rot set-word-property ;
: end-compiling ( word -- xt )

View File

@ -26,24 +26,42 @@
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: compiler
USE: combinators
USE: words
USE: stack
USE: kernel
USE: math
USE: lists
: compile-ifte ( -- )
pop-literal pop-literal commit-literals
: compile-f-test ( -- fixup )
#! Push addr where we write the branch target address.
POP-DS
! ptr to condition is now in EAX
f address-of EAX CMP-I-[R]
compiled-offset JE ( -- fixup ) >r
( t -- ) compile-quot
RET
compiled-offset r> ( fixup -- ) fixup
( f -- ) compile-quot
RET ;
compiled-offset JE ;
[ compile-ifte ]
"compiling"
"ifte" [ "combinators" ] search
set-word-property
: branch-target ( fixup -- )
cell compile-aligned compiled-offset swap fixup ;
: compile-else ( fixup -- fixup )
#! Push addr where we write the branch target address,
#! and fixup branch target address from compile-f-test.
#! Push f for the fixup if we're tail position.
tail? [ RET f ] [ 0 JUMP ] ifte swap branch-target ;
: compile-end-if ( fixup -- )
tail? [ drop RET ] [ branch-target ] ifte ;
: compile-ifte ( -- )
pop-literal pop-literal commit-literals
compile-f-test >r
( t -- ) compile-quot
r> compile-else >r
( f -- ) compile-quot
r> compile-end-if ;
[
[ ifte compile-ifte ]
] [
unswons "compiling" swap set-word-property
] each

View File

@ -25,7 +25,6 @@
! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: cross-compiler
USE: combinators
USE: kernel
USE: lists
@ -127,7 +126,7 @@ DEFER: set-word-plist
IN: unparser
DEFER: unparse-float
IN: cross-compiler
IN: image
: primitives, ( -- )
1 [

View File

@ -25,7 +25,12 @@
! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: cross-compiler
IN: namespaces
( Java Factor doesn't have this )
: namespace-buckets 23 ;
IN: image
USE: combinators
USE: errors
USE: hashtables
@ -254,12 +259,6 @@ DEFER: '
( Word definitions )
IN: namespaces
: namespace-buckets 23 ;
IN: cross-compiler
: (vocabulary) ( name -- vocab )
#! Vocabulary for target image.
dup "vocabularies" get hash dup [

View File

@ -25,7 +25,7 @@
! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: cross-compiler
IN: image
USE: combinators
USE: kernel
USE: lists

View File

@ -26,7 +26,7 @@
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
USE: lists
USE: cross-compiler
USE: image
primitives,
[

View File

@ -25,7 +25,7 @@
! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: cross-compiler
IN: image
USE: namespaces
USE: parser

View File

@ -28,7 +28,6 @@
IN: syntax
USE: combinators
USE: cross-compiler
USE: errors
USE: kernel
USE: lists

View File

@ -1,5 +1,5 @@
USE: test
USE: cross-compiler
USE: image
USE: namespaces
USE: stdio

View File

@ -75,3 +75,8 @@ garbage-collection
: one-rec [ f one-rec ] [ "hi" ] ifte ; compiled
[ "hi" ] [ t one-rec ] unit-test
: after-ifte-test
t [ ] [ ] ifte 5 ; compiled
[ 5 ] [ after-ifte-test ] unit-test

View File

@ -7,7 +7,8 @@ void* alloc_guarded(CELL size)
int pagesize = getpagesize();
char* array = mmap((void*)0,pagesize + size + pagesize,
PROT_READ | PROT_WRITE,MAP_ANON | MAP_PRIVATE,-1,0);
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANON | MAP_PRIVATE,-1,0);
if(mprotect(array,pagesize,PROT_NONE) == -1)
fatal_error("Cannot allocate low guard page",(CELL)array);