USING: help io parser sequences words ;

ARTICLE: "parser" "The parser"
"This section concerns itself with reflective access and extension of the Factor parser. The parser algorithm and standard syntax is described in " { $link "syntax" } "."
$nl
"The set of words making up the parser are found in the " { $vocab-link "parser" } " and " { $vocab-link "syntax" } " vocabularies."
$nl
"As documented in " { $link "vocabulary-search" } ", the parser looks up words in the vocabulary search path. New word definitions are added to the current vocabulary. These two parameters are stored in a pair of variables:"
{ $subsection use }
{ $subsection in }
"There are two simple ways to call the parser which use the vocabulary search path currently in scope, which is usually the default listener search path:"
{ $subsection parse }
{ $subsection eval }
"The parser can also take input from a sequence of lines, or a stream:"
{ $subsection parse-lines }
{ $subsection parse-stream }
"Note that the above words are low-level and are not the canonical way to read code from a source file. To learn about loading source files, see " { $link "sources" } "."
$nl
"The parser can be extended with new parsing word definitions."
{ $subsection "parsing-words" }
"The next level of customization above writing parsing words is extending the lexer:"
{ $subsection "parser-lexer" } ;

ARTICLE: "parsing-words" "Parsing words"
"The Factor parser is follows a simple recursive-descent design. The parser reads successive tokens from the input; if the token identifies a number or an ordinary word, it is added to an accumulator vector. Otherwise if the token identifies a parsing word, the parsing word is executed immediately."
$nl
"Parsing words are marked by suffixing the definition with a " { $link POSTPONE: parsing } " declaration."
$nl
"Parsing words must have stack effect " { $snippet "( accum -- accum )" } ", where " { $snippet "accum" } " is the accumulator vector supplied by the parser. Parsing words can read input, add word definitions to the dictionary, and do anything an ordinary word can."
$nl
"Tools for implementing parsing words:"
{ $subsection "reading-ahead" }
{ $subsection "parsing-word-nest" }
{ $subsection "defining-words" }
{ $subsection "parsing-tokens" } ;

ARTICLE: "reading-ahead" "Reading ahead"
"Parsing words can consume input:"
{ $subsection scan }
{ $subsection scan-word }
"For example, the " { $link POSTPONE: HEX: } " word uses this feature to read hexadecimal literals:"
{ $see POSTPONE: HEX: }
"It is defined in terms of a lower-level word that takes the numerical base on the data stack, but reads the number from the parser and then adds it to the parse tree:"
{ $see parse-base }
"Another simple example is the " { $link POSTPONE: \ } " word:"
{ $see POSTPONE: \ } ;

ARTICLE: "parsing-word-nest" "Nested structure"
"Recall that the parser loop calls parsing words with an accumulator vector on the stack. The parser loop can be invoked recursively with a new, empty accumulator; the result can then be added to the original accumulator. This is how parsing words for object literals are implemented; object literals can nest arbitrarily deep."
$nl
"A simple example is the parsing word that reads a quotation:"
{ $see POSTPONE: [ }
"This word uses a utility word which recursively invokes the parser, reading objects into a new accumulator until an occurrence of " { $link POSTPONE: ] } ":"
{ $subsection parse-literal }
"There is another, lower-level word for reading nested structure, which is also useful when called directly:"
{ $subsection parse-until }
"Note that " { $link POSTPONE: ] } " is just a dummy word; declaring it as a " { $link POSTPONE: delimiter } " causes it to throw an error when an unpaired occurrence is encountered:"
{ $see POSTPONE: ] }
{ $see-also POSTPONE: { POSTPONE: H{ POSTPONE: V{ POSTPONE: W{ POSTPONE: T{ POSTPONE: } } ;

ARTICLE: "defining-words" "Defining words"
"Defining words add definitions to the dictionary without modifying the parse tree. The simplest example is the " { $link POSTPONE: SYMBOL: } " word."
{ $see POSTPONE: SYMBOL: }
"The key factor in the definition of " { $link POSTPONE: SYMBOL: } " is " { $link CREATE } ", which reads a token from the input and creates a word with that name. This word is then passed to " { $link define-symbol } "."
{ $subsection CREATE }
"Colon definitions are defined in a more elaborate way:"
{ $subsection POSTPONE: : }
"The " { $subsection POSTPONE: : } " word first calls " { $link CREATE } ", and then reads input until reaching " { $link POSTPONE: ; } " using a utility word:"
{ $subsection parse-definition }
"The " { $link POSTPONE: ; } " word is just a delimiter; an unpaired occurrence throws a parse error:"
{ $see POSTPONE: ; }
"There are additional parsing words whose syntax is delimited by  " { $link POSTPONE: ; } ", and they are all implemented by calling " { $link parse-definition } "."
{ $see-also POSTPONE: C: POSTPONE: G: POSTPONE: M: POSTPONE: PREDICATE: POSTPONE: UNION: } ;

ARTICLE: "parsing-tokens" "Parsing raw tokens"
"So far we have seen how to read individual tokens, or read a sequence of parsed objects until a delimiter. It is also possible to read raw tokens from the input and perform custom processing."
$nl
"One example is the " { $link POSTPONE: USING: } " parsing word."
{ $see POSTPONE: USING: } 
"It reads a list of vocabularies terminated by " { $link POSTPONE: ; } ". However, the vocabulary names do not name words, except by coincidence; so " { $link parse-until } " cannot be used here. Instead, a lower-level word is called:"
{ $subsection parse-tokens } ;

ARTICLE: "parser-lexer" "The lexer"
"Two variables that encapsulate internal parser state:"
{ $subsection file }
{ $subsection lexer }
"Creating a default lexer:"
{ $subsection <lexer> }
"A word to test of the end of input has been reached:"
{ $subsection still-parsing? }
"A word to get the text of the current line:"
{ $subsection line-text }
"A word to advance the lexer to the next line:"
{ $subsection next-line }
"Two generic words to override the lexer's token boundary detection:"
{ $subsection skip-blank }
{ $subsection skip-word }
"A utility used when parsing string literals:"
{ $subsection parse-string }
"The parser can be invoked with a custom lexer:"
{ $subsection (parse-lines) }
{ $subsection with-parser } ;