factor/doc/compiler.tex

199 lines
6.9 KiB
TeX
Raw Normal View History

\documentclass{article}
\usepackage[plainpages=false,colorlinks]{hyperref}
\usepackage[style=list,toc]{glossary}
\usepackage{alltt}
\usepackage{times}
\usepackage{tabularx}
\usepackage{epsfig}
\usepackage{epsf}
\usepackage{amssymb}
\usepackage{epstopdf}
\pagestyle{headings}
\setcounter{tocdepth}{3}
\setcounter{secnumdepth}{3}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}
\newcommand{\bs}{\char'134}
\newcommand{\dq}{\char'42}
\newcommand{\tto}{\symbol{123}}
\newcommand{\ttc}{\symbol{125}}
\newcommand{\pound}{\char'43}
\newcommand{\vocabulary}[1]{\emph{Vocabulary:} \texttt{#1}&\\}
\newcommand{\parsingword}[2]{\index{\texttt{#1}}\emph{Parsing word:} \texttt{#2}&\\}
\newcommand{\ordinaryword}[2]{\index{\texttt{#1}}\emph{Word:} \texttt{#2}&\\}
\newcommand{\symbolword}[1]{\index{\texttt{#1}}\emph{Symbol:} \texttt{#1}&\\}
\newcommand{\classword}[1]{\index{\texttt{#1}}\emph{Class:} \texttt{#1}&\\}
\newcommand{\genericword}[2]{\index{\texttt{#1}}\emph{Generic word:} \texttt{#2}&\\}
\newcommand{\predword}[1]{\ordinaryword{#1}{#1~( object -- ?~)}}
\setlength{\tabcolsep}{1mm}
\newcommand{\wordtable}[1]{
%HEVEA\renewcommand{\index}[1]{}
%HEVEA\renewcommand{\glossary}[1]{}
\begin{tabularx}{12cm}{lX}
\hline
#1
\hline
\end{tabularx}
}
\makeatletter
\makeatother
\begin{document}
\title{The Factor compiler}
\author{Slava Pestov}
\maketitle
\tableofcontents{}
\section{The compiler}
The compiler can provide a substantial speed boost for words whose stack effect can be inferred. Words without a known stack effect cannot be compiled, and must be run in the interpreter. The compiler generates native code, and so far, x86 and PowerPC backends have been developed.
To compile a single word, call \texttt{compile}:
\begin{alltt}
\textbf{ok} \bs pref-size compile
\textbf{Compiling pref-size}
\end{alltt}
During bootstrap, all words in the library with a known stack effect are compiled. You can
circumvent this, for whatever reason, by passing the \texttt{-no-compile} switch during
bootstrap:
\begin{alltt}
\textbf{bash\$} ./f boot.image.le32 -no-compile
\end{alltt}
The compiler has two limitations you must be aware of. First, if an exception is thrown in compiled code, the return stack will be incomplete, since compiled words do not push themselves there. Second, compiled code cannot be profiled. These limitations will be resolved in a future release.
The compiler consists of multiple stages -- first, a dataflow graph is inferred, then various optimizations are done on this graph, then it is transformed into a linear representation, further optimizations are done, and finally, machine code is generated from the linear representation.
\subsection{Linear intermediate representation}
The linear IR is the second of the two intermediate
representations used by Factor. It is basically a high-level
assembly language. Linear IR operations are called VOPs. The last stage of the compiler generates machine code instructions corresponding to each \emph{virtual operation} in the linear IR.
To perform everything except for the machine code generation, use the \texttt{precompile} word. This will dump the optimized linear IR instead of generating code, which can be useful sometimes.
\begin{alltt}
\textbf{ok} \bs append precompile
\textbf{<< \%prologue << vop [ ] [ ] [ ] [ ] >> >>
<< \%peek-d << vop [ ] [ 1 ] [ << vreg ... 0 >> ] [ ] >> >>
<< \%peek-d << vop [ ] [ 0 ] [ << vreg ... 1 >> ] [ ] >> >>
<< \%replace-d << vop [ ] [ 0 << vreg ... 0 >> ] [ ] [ ] >> >>
<< \%replace-d << vop [ ] [ 1 << vreg ... 1 >> ] [ ] [ ] >> >>
<< \%inc-d << vop [ ] [ -1 ] [ ] [ ] >> >>
<< \%return << vop [ ] [ ] [ ] [ ] >> >>}
\end{alltt}
\subsubsection{Control flow}
\begin{description}
\item[\texttt{\%prologue}] On x86, this does nothing. On PowerPC, at the start of
each word that calls a subroutine, we store the link
register in r0, then push r0 on the C stack.
\item[\texttt{\%call-label}] On PowerPC, uses near calling convention, where the
caller pushes the return address.
\item[\texttt{\%call}] On PowerPC, if calling a primitive, compiles a sequence that loads a 32-bit literal and jumps to that address. For other compiled words, compiles an immediate branch with link, so all compiled word definitions must be within 64 megabytes of each other.
\item[\texttt{\%jump-label}] Like \texttt{\%call-label} except the return address is not saved. Used for tail calls.
\item[\texttt{\%jump}] Like \texttt{\%call} except the return address is not saved. Used for tail calls.
\item[\texttt{\%dispatch}] Compile a piece of code that jumps to an offset in a
jump table indexed by an integer. The jump table consists of \texttt{\%target-label} and \texttt{\%target} must immediately follow this VOP.
\item[\texttt{\%target}] Not supported on PowerPC.
\item[\texttt{\%target-label}] A jump table entry.
\end{description}
\subsubsection{Slots and objects}
\begin{description}
\item[\texttt{\%slot}] The untagged object is in \texttt{vop-out-1}, the tagged slot
number is in \texttt{vop-in-1}.
\item[\texttt{\%fast-slot}] The tagged object is in \texttt{vop-out-1}, the pointer offset is
in \texttt{vop-in-1}. the offset already takes the type tag into
account, so its just one instruction to load.
\item[\texttt{\%set-slot}] The new value is \texttt{vop-in-1}, the object is \texttt{vop-in-2}, and
the slot number is \texttt{vop-in-3}.
\item[\texttt{\%fast-set-slot}] The new value is \texttt{vop-in-1}, the object is \texttt{vop-in-2}, and
the slot offset is \texttt{vop-in-3}.
the offset already takes the type tag into account, so
it's just one instruction to load.
\item[\texttt{\%write-barrier}] Mark the card containing the object pointed by \texttt{vop-in-1}.
\item[\texttt{\%untag}] Mask off the tag bits of \texttt{vop-in-1}, store result in
\texttt{vop-in-1} (which should equal \texttt{vop-out-1}!)
\item[\texttt{\%untag-fixnum}] Shift \texttt{vop-in-1} to the right by 3 bits, store result in
\texttt{vop-in-1} (which should equal \texttt{vop-out-1}!)
\item[\texttt{\%type}] Intrinstic version of type primitive. It outputs an
unboxed value in \texttt{vop-out-1}.
\end{description}
\subsubsection{Alien interface}
\begin{description}
\item[\texttt{\%parameters}] Ignored on x86.
\item[\texttt{\%parameter}] Ignored on x86.
\item[\texttt{\%unbox}] An unboxer function takes a value from the data stack
and converts it into a C value.
\item[\texttt{\%box}] A boxer function takes a C value as a parameter and
converts into a Factor value, and pushes it on the data
stack.
On x86, C functions return integers in EAX.
\item[\texttt{\%box-float}] On x86, C functions return floats on the FP stack.
\item[\texttt{\%box-double}] On x86, C functions return doubles on the FP stack.
\item[\texttt{\%cleanup}] Ignored on PowerPC.
On x86, in the cdecl ABI, the caller must pop input
parameters off the C stack. In stdcall, the callee does
it, so this node is not used in that case.
\end{description}
\end{document}