compiler work

cvs
Slava Pestov 2004-09-11 19:26:24 +00:00
parent c02755227e
commit 34041bedbf
16 changed files with 117 additions and 65 deletions

View File

@ -3,7 +3,7 @@ CC = gcc
# On PowerPC G5: # On PowerPC G5:
# CFLAGS = -mcpu=970 -mtune=970 -mpowerpc64 -ffast-math -O3 # CFLAGS = -mcpu=970 -mtune=970 -mpowerpc64 -ffast-math -O3
# On Pentium 4: # On Pentium 4:
# CFLAGS = -march=pentium4 -ffast-math -O3 # CFLAGS = -march=pentium4 -ffast-math -O3 -fomit-frame-pointer
# Add -fomit-frame-pointer if you don't care about debugging # Add -fomit-frame-pointer if you don't care about debugging
CFLAGS = -Os -g -Wall CFLAGS = -Os -g -Wall

View File

@ -409,6 +409,7 @@ is pushed on the stack. Try evaluating the following:
call .s call .s
\emph{\{ 5 \}} \emph{\{ 5 \}}
\end{alltt} \end{alltt}
\texttt{call} \texttt{( quot -{}- )} executes the quotation at the \texttt{call} \texttt{( quot -{}- )} executes the quotation at the
top of the stack. Using \texttt{call} with a literal quotation is top of the stack. Using \texttt{call} with a literal quotation is
useless; writing out the elements of the quotation has the same effect. useless; writing out the elements of the quotation has the same effect.
@ -442,10 +443,6 @@ More combinators will be introduced in later sections.
\subsection{Recursion} \subsection{Recursion}
The idea of \emph{recursion} is key to understanding Factor. A \emph{recursive} word definition is one that refers to itself, usually in one branch of a conditional.
FIXME
\section{Numbers} \section{Numbers}
Factor provides a rich set of math words. Factor numbers more closely model the mathematical concept of a number than other languages. Where possible, exact answers are given -- for example, adding or multiplying two integers never results in overflow, and dividing two integers yields a fraction rather than a truncated result. Complex numbers are supported, allowing many functions to be computed with parameters that would raise errors or return ``not a number'' in other languages. Factor provides a rich set of math words. Factor numbers more closely model the mathematical concept of a number than other languages. Where possible, exact answers are given -- for example, adding or multiplying two integers never results in overflow, and dividing two integers yields a fraction rather than a truncated result. Complex numbers are supported, allowing many functions to be computed with parameters that would raise errors or return ``not a number'' in other languages.
@ -2400,11 +2397,28 @@ The name stack is really just a vector. The words \texttt{>n} and \texttt{n>} ar
: n> ( n:namespace -- namespace ) namestack* vector-pop ; : n> ( n:namespace -- namespace ) namestack* vector-pop ;
\end{alltt} \end{alltt}
\section{Metaprogramming} \section{The execution model in depth}
Recall that code quotations are in fact just linked lists. Factor code is data, and vice versa. Essentially, the interpreter iterates through code quotations, pushing literals and executing words. When a word is executed, one of two things happen -- either the word has a colon definition, and the interpreter is invoked recursively on the definition, or the word is primitive, and it is executed by the underlying virtual machine. A word is itself a first-class object. \subsection{Recursion}
It is the job of the parser to transform source code denoting literals and words into their internal representations. This is done using a vocabulary of \emph{parsing words}. The prettyprinter does the converse, by printing out data structures in a parsable form (both to humans and Factor). Because code is data, text representation of source code doubles as a way to serialize almost any Factor object. The idea of \emph{recursion} is key to understanding Factor. A \emph{recursive} word definition is one that refers to itself, usually in one branch of a conditional.
tail recursion
preserving values between iterations
ensuring a consistent stack effect
works well with lists, since only the head is passed
not so well with vectors and strings -- need an obj+index
\subsection{Combinators}
a combinator is a recursive word that takes quotations
how to ensure a consistent stack view for the quotations
\subsection{Looking at words} \subsection{Looking at words}
@ -2452,24 +2466,6 @@ If the primitive number is set to 1, the word is a colon definition and the para
The word \texttt{define ( word quot -{}- )} defines a word to have the specified colon definition. Note that \texttt{create} and \texttt{define} perform an action somewhat analagous to the \texttt{: ... ;} notation for colon definitions, except at parse time rather than run time. The word \texttt{define ( word quot -{}- )} defines a word to have the specified colon definition. Note that \texttt{create} and \texttt{define} perform an action somewhat analagous to the \texttt{: ... ;} notation for colon definitions, except at parse time rather than run time.
\subsection{The prettyprinter}
We've already seen the word \texttt{.} which prints the top of the stack in a form that may be read back in. The word \texttt{prettyprint} is similar, except the output is in an indented, multiple-line format. Both words are in the \texttt{prettyprint} vocabulary. Here is an example:
\begin{alltt}
{[} 1 {[} 2 3 4 {]} 5 {]} .
\emph{{[} 1 {[} 2 3 4 {]} 5 {]}}
{[} 1 {[} 2 3 4 {]} 5 {]} prettyprint
\emph{{[}
1 {[}
2 3 4
{]} 5
{]}}
\end{alltt}
\subsection{The parser}
\subsection{Parsing words} \subsection{Parsing words}
Lets take a closer look at Factor syntax. Consider a simple expression, Lets take a closer look at Factor syntax. Consider a simple expression,
@ -2538,6 +2534,29 @@ next occurrence of \texttt{{}''}, and appends this string to the
current node of the parse tree. Note that strings and words are different current node of the parse tree. Note that strings and words are different
types of objects. Strings are covered in great detail later. types of objects. Strings are covered in great detail later.
\section{NOT DONE}
Recall that code quotations are in fact just linked lists. Factor code is data, and vice versa. Essentially, the interpreter iterates through code quotations, pushing literals and executing words. When a word is executed, one of two things happen -- either the word has a colon definition, and the interpreter is invoked recursively on the definition, or the word is primitive, and it is executed by the underlying virtual machine. A word is itself a first-class object.
It is the job of the parser to transform source code denoting literals and words into their internal representations. This is done using a vocabulary of \emph{parsing words}. The prettyprinter does the converse, by printing out data structures in a parsable form (both to humans and Factor). Because code is data, text representation of source code doubles as a way to serialize almost any Factor object.
\subsection{The prettyprinter}
We've already seen the word \texttt{.} which prints the top of the stack in a form that may be read back in. The word \texttt{prettyprint} is similar, except the output is in an indented, multiple-line format. Both words are in the \texttt{prettyprint} vocabulary. Here is an example:
\begin{alltt}
{[} 1 {[} 2 3 4 {]} 5 {]} .
\emph{{[} 1 {[} 2 3 4 {]} 5 {]}}
{[} 1 {[} 2 3 4 {]} 5 {]} prettyprint
\emph{{[}
1 {[}
2 3 4
{]} 5
{]}}
\end{alltt}
\subsection{Profiling}
\section{PRACTICAL: Infix syntax} \section{PRACTICAL: Infix syntax}

View File

@ -35,7 +35,7 @@ import java.io.*;
public class FactorInterpreter implements FactorObject, Runnable public class FactorInterpreter implements FactorObject, Runnable
{ {
public static final String VERSION = "0.65"; public static final String VERSION = "0.66";
public static final Cons DEFAULT_USE = new Cons("builtins", public static final Cons DEFAULT_USE = new Cons("builtins",
new Cons("syntax",new Cons("scratchpad",null))); new Cons("syntax",new Cons("scratchpad",null)));

View File

@ -26,6 +26,7 @@
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: compiler IN: compiler
USE: combinators
USE: math USE: math
USE: kernel USE: kernel
USE: stack USE: stack
@ -36,6 +37,13 @@ USE: stack
: init-assembler ( -- ) : init-assembler ( -- )
compiled-offset literal-table + set-compiled-offset ; compiled-offset literal-table + set-compiled-offset ;
: compile-aligned ( n -- )
dup compiled-offset mod dup 0 = [
2drop
] [
- compiled-offset + set-compiled-offset
] ifte ;
: intern-literal ( obj -- lit# ) : intern-literal ( obj -- lit# )
address-of address-of
literal-top set-compiled-cell literal-top set-compiled-cell

View File

@ -41,6 +41,9 @@ USE: combinators
: ESI 6 ; : ESI 6 ;
: EDI 7 ; : EDI 7 ;
: MOD-R/M ( r/m reg/opcode mod -- )
6 shift swap 3 shift bitor bitor compile-byte ;
: PUSH ( reg -- ) : PUSH ( reg -- )
HEX: 50 + compile-byte ; HEX: 50 + compile-byte ;
@ -57,7 +60,7 @@ USE: combinators
drop HEX: a1 compile-byte drop HEX: a1 compile-byte
] [ ] [
HEX: 8b compile-byte HEX: 8b compile-byte
3 shift BIN: 101 bitor compile-byte BIN: 101 swap 0 MOD-R/M
] ifte compile-cell ; ] ifte compile-cell ;
: I>[R] ( imm reg -- ) : I>[R] ( imm reg -- )
@ -71,21 +74,21 @@ USE: combinators
nip HEX: a3 compile-byte nip HEX: a3 compile-byte
] [ ] [
HEX: 89 compile-byte HEX: 89 compile-byte
swap 3 shift BIN: 101 bitor compile-byte swap BIN: 101 swap 0 MOD-R/M
] ifte compile-cell ; ] ifte compile-cell ;
: [R]>R ( reg reg -- ) : [R]>R ( reg reg -- )
#! MOV INDIRECT <reg> TO <reg>. #! MOV INDIRECT <reg> TO <reg>.
HEX: 8b compile-byte swap 3 shift bitor compile-byte ; HEX: 8b compile-byte swap 0 MOD-R/M ;
: R>[R] ( reg reg -- ) : R>[R] ( reg reg -- )
#! MOV <reg> TO INDIRECT <reg>. #! MOV <reg> TO INDIRECT <reg>.
HEX: 89 compile-byte swap 3 shift bitor compile-byte ; HEX: 89 compile-byte swap 0 MOD-R/M ;
: I+[I] ( imm addr -- ) : I+[I] ( imm addr -- )
#! ADD <imm> TO ADDRESS <addr> #! ADD <imm> TO ADDRESS <addr>
HEX: 81 compile-byte HEX: 81 compile-byte
HEX: 05 compile-byte BIN: 101 0 0 MOD-R/M
compile-cell compile-cell
compile-cell ; compile-cell ;
@ -93,14 +96,14 @@ USE: combinators
#! SUBTRACT <imm> FROM <reg>, STORE RESULT IN <reg> #! SUBTRACT <imm> FROM <reg>, STORE RESULT IN <reg>
over -128 127 between? [ over -128 127 between? [
HEX: 83 compile-byte HEX: 83 compile-byte
HEX: e8 + compile-byte BIN: 101 BIN: 11 MOD-R/M
compile-byte compile-byte
] [ ] [
dup EAX = [ dup EAX = [
drop HEX: 2d compile-byte drop HEX: 2d compile-byte
] [ ] [
HEX: 81 compile-byte HEX: 81 compile-byte
BIN: 11101000 bitor BIN: 101 BIN: 11 MOD-R/M
] ifte ] ifte
compile-cell compile-cell
] ifte ; ] ifte ;
@ -111,11 +114,11 @@ USE: combinators
#! 81 38 33 33 33 00 cmpl $0x333333,(%eax) #! 81 38 33 33 33 00 cmpl $0x333333,(%eax)
over -128 127 between? [ over -128 127 between? [
HEX: 83 compile-byte HEX: 83 compile-byte
HEX: 38 + compile-byte BIN: 111 0 MOD-R/M
compile-byte compile-byte
] [ ] [
HEX: 81 compile-byte HEX: 81 compile-byte
HEX: 38 + compile-byte BIN: 111 0 MOD-R/M
compile-cell compile-cell
] ifte ; ] ifte ;
@ -127,8 +130,8 @@ USE: combinators
4 DATASTACK I+[I] ; 4 DATASTACK I+[I] ;
: [LITERAL] ( cell -- ) : [LITERAL] ( cell -- )
#! Push literal on data stack by following an indirect #! Push complex literal on data stack by following an
#! pointer. #! indirect pointer.
ECX PUSH ECX PUSH
( cell -- ) ECX [I]>R ( cell -- ) ECX [I]>R
DATASTACK EAX [I]>R DATASTACK EAX [I]>R

View File

@ -132,6 +132,7 @@ USE: words
] with-scope ; ] with-scope ;
: begin-compiling ( word -- ) : begin-compiling ( word -- )
cell compile-aligned
compiled-offset "compiled-xt" rot set-word-property ; compiled-offset "compiled-xt" rot set-word-property ;
: end-compiling ( word -- xt ) : end-compiling ( word -- xt )

View File

@ -26,24 +26,42 @@
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: compiler IN: compiler
USE: combinators
USE: words USE: words
USE: stack USE: stack
USE: kernel USE: kernel
USE: math USE: math
USE: lists
: compile-ifte ( -- ) : compile-f-test ( -- fixup )
pop-literal pop-literal commit-literals #! Push addr where we write the branch target address.
POP-DS POP-DS
! ptr to condition is now in EAX ! ptr to condition is now in EAX
f address-of EAX CMP-I-[R] f address-of EAX CMP-I-[R]
compiled-offset JE ( -- fixup ) >r compiled-offset JE ;
( t -- ) compile-quot
RET
compiled-offset r> ( fixup -- ) fixup
( f -- ) compile-quot
RET ;
[ compile-ifte ] : branch-target ( fixup -- )
"compiling" cell compile-aligned compiled-offset swap fixup ;
"ifte" [ "combinators" ] search
set-word-property : compile-else ( fixup -- fixup )
#! Push addr where we write the branch target address,
#! and fixup branch target address from compile-f-test.
#! Push f for the fixup if we're tail position.
tail? [ RET f ] [ 0 JUMP ] ifte swap branch-target ;
: compile-end-if ( fixup -- )
tail? [ drop RET ] [ branch-target ] ifte ;
: compile-ifte ( -- )
pop-literal pop-literal commit-literals
compile-f-test >r
( t -- ) compile-quot
r> compile-else >r
( f -- ) compile-quot
r> compile-end-if ;
[
[ ifte compile-ifte ]
] [
unswons "compiling" swap set-word-property
] each

View File

@ -25,7 +25,6 @@
! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: cross-compiler
USE: combinators USE: combinators
USE: kernel USE: kernel
USE: lists USE: lists
@ -127,7 +126,7 @@ DEFER: set-word-plist
IN: unparser IN: unparser
DEFER: unparse-float DEFER: unparse-float
IN: cross-compiler IN: image
: primitives, ( -- ) : primitives, ( -- )
1 [ 1 [

View File

@ -25,7 +25,12 @@
! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: cross-compiler IN: namespaces
( Java Factor doesn't have this )
: namespace-buckets 23 ;
IN: image
USE: combinators USE: combinators
USE: errors USE: errors
USE: hashtables USE: hashtables
@ -254,12 +259,6 @@ DEFER: '
( Word definitions ) ( Word definitions )
IN: namespaces
: namespace-buckets 23 ;
IN: cross-compiler
: (vocabulary) ( name -- vocab ) : (vocabulary) ( name -- vocab )
#! Vocabulary for target image. #! Vocabulary for target image.
dup "vocabularies" get hash dup [ dup "vocabularies" get hash dup [

View File

@ -25,7 +25,7 @@
! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: cross-compiler IN: image
USE: combinators USE: combinators
USE: kernel USE: kernel
USE: lists USE: lists

View File

@ -26,7 +26,7 @@
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
USE: lists USE: lists
USE: cross-compiler USE: image
primitives, primitives,
[ [

View File

@ -25,7 +25,7 @@
! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ! OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ! ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IN: cross-compiler IN: image
USE: namespaces USE: namespaces
USE: parser USE: parser

View File

@ -28,7 +28,6 @@
IN: syntax IN: syntax
USE: combinators USE: combinators
USE: cross-compiler
USE: errors USE: errors
USE: kernel USE: kernel
USE: lists USE: lists

View File

@ -1,5 +1,5 @@
USE: test USE: test
USE: cross-compiler USE: image
USE: namespaces USE: namespaces
USE: stdio USE: stdio

View File

@ -75,3 +75,8 @@ garbage-collection
: one-rec [ f one-rec ] [ "hi" ] ifte ; compiled : one-rec [ f one-rec ] [ "hi" ] ifte ; compiled
[ "hi" ] [ t one-rec ] unit-test [ "hi" ] [ t one-rec ] unit-test
: after-ifte-test
t [ ] [ ] ifte 5 ; compiled
[ 5 ] [ after-ifte-test ] unit-test

View File

@ -7,7 +7,8 @@ void* alloc_guarded(CELL size)
int pagesize = getpagesize(); int pagesize = getpagesize();
char* array = mmap((void*)0,pagesize + size + pagesize, char* array = mmap((void*)0,pagesize + size + pagesize,
PROT_READ | PROT_WRITE,MAP_ANON | MAP_PRIVATE,-1,0); PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANON | MAP_PRIVATE,-1,0);
if(mprotect(array,pagesize,PROT_NONE) == -1) if(mprotect(array,pagesize,PROT_NONE) == -1)
fatal_error("Cannot allocate low guard page",(CELL)array); fatal_error("Cannot allocate low guard page",(CELL)array);