factor/doc/handbook/parser.facts

USING: help io parser sequences words ;

ARTICLE: "parser" "The parser"
"This section concerns itself with reflective access and extension of the Factor parser. The parser algorithm and standard syntax is described in " { $link "syntax" } "."
$terpri
"The set of words making up the parser are found in the " { $snippet "parser" } " and " { $snippet "syntax" } " vocabularies."
$terpri
"As documented in " { $link "vocabulary-search" } ", the parser looks up words in the vocabulary search path. New word definitions are added to the current vocabulary. These two parameters are stored in a pair of variables:"
{ $subsection use }
{ $subsection in }
"There are two simple ways to call the parser:"
{ $subsection parse }
{ $subsection eval }
"More sophisticated facilities exist, too."
{ $subsection "parse-stream" }
"The parser can be extended with new parsing word definitions."
{ $subsection "parsing-words" } ;

ARTICLE: "parse-stream" "Parsing from streams"
"By convention, words for parsing input from streams use a certain default vocabulary search path:"
{ $subsection file-vocabs }
"The central word for parsing input from a stream:"
{ $subsection parse-stream }
"Utilities for working with files:"
{ $subsection parse-file }
{ $subsection run-file }
"Utilities for working with Factor libarary files:"
{ $subsection resource-path }
{ $subsection parse-resource }
{ $subsection run-resource } ;

ARTICLE: "parsing-words" "Parsing words"
"Parsing words execute at parse time, and therefore can access and modify the state of the parser, as well as add objects to the parse tree. Parsing words are a difficult concept to grasp, so this section has several examples and explains the workings of some of the parsing words provided in the library."
$terpri
"Parsing words are marked by suffixing the definition with a declaration:"
{ $subsection POSTPONE: parsing }
{ $subsection "parsing-word-nest" }
{ $subsection "reading-ahead" }
{ $subsection "defining-words" }
{ $subsection "string-mode" }
{ $subsection "parser-internals" } ;

ARTICLE: "parsing-word-nest" "Nested structure"
"The first thing to look at is how the parse tree is built. When parsing begins, " { $link f } " is pushed on the data stack; when a new object is parsed, " { $link ?push } " is called to add it to the parse tree. The first call to " { $link ?push } " creates a new one-element vector, and subsequent calls add elements to the end of this vector. When parsing is complete, the vector is converted into a quotation."
$terpri
"Lets look at a simple example; the parsing of " { $snippet "1 2 3" } ":"
{ $table
    { "Action" "Stack" }
    { "Initial stack after parsing begins" { $snippet "f" } }
    { { "Token: " { $snippet "1" } } { $snippet "V{ 1 }" } }
    { { "Token: " { $snippet "2" } } { $snippet "V{ 1 2 }" } }
    { { "Token: " { $snippet "3" } } { $snippet "V{ 1 2 3 }" } }
    { "Final stack upon completion:" { $snippet "[ 1 2 3 ]" } }
}
"Nested structure is a bit more involved. The basic idea is that parsing words can push " { $link f } " on the stack to begin a new level of nesting, then all subsequent objects are pushed onto this sequence, until another parsing word adds this sequence to the vector underneath."
$terpri
"The parsing words that delimit the beginning and the end of a quotation illustrate the idiom:"
{ $subsection POSTPONE: [ }
{ $subsection POSTPONE: ] }
"Let us ponder, then, how one particular string will parse:"
{ $snippet "\"1 [ 2 3 ] 4\"" }
{ $table
    { "Action" "Stack" }
    { "Initial stack after parsing begins" { $snippet "f" } }
    { { "Token: " { $snippet "1" } } { $snippet "V{ 1 }" } }
    { { "Token: " { $snippet "[" } } { $snippet "V{ 1 } f" } }
    { { "Token: " { $snippet "2" } } { $snippet "V{ 1 } V{ 2 }" } }
    { { "Token: " { $snippet "3" } } { $snippet "V{ 1 } V{ 2 3 }" } }
    { { "Token: " { $snippet "]" } } { $snippet "V{ 1 [ 2 3 ] }" } }
    { { "Token: " { $snippet "4" } } { $snippet "V{ 1 [ 2 3 ] 4 }" } }
    { "Final stack upon completion" { $snippet "[ 1 [ 2 3 ] 4 ]" } }
}
"Notice how in the definition of the quotation parsing words, the final word " { $link POSTPONE: ] } " does all the work. A closely related set of parsing words for reading various other literal types implements another useful idiom."
$terpri
"The word set in question consists of various start delimiters, such a " { $link POSTPONE: { } " for arrays and " { $link POSTPONE: H{ } " for hashtables, together with one end delimiter " { $link POSTPONE: } } ". The start words push a quotation in addition to pushing " { $link f } "; the end word applies the quotation to the newly-parsed vector; the quotation converts it to the appropriate type of literal."
{ $subsection POSTPONE: { }
{ $subsection POSTPONE: H{ }
{ $subsection POSTPONE: V{ }
{ $subsection POSTPONE: W{ }
{ $subsection POSTPONE: T{ }
{ $subsection POSTPONE: } } ;

ARTICLE: "reading-ahead" "Reading ahead"
"Parsing words can consume input from the current line to implement various forms of custom syntax."
{ $subsection scan }
{ $subsection scan-word }
"For example, the " { $link POSTPONE: HEX: } " word, for reading hexadecimal literals, uses this facility. It is defined in terms of a lower-level " { $link (BASE) } " word that takes the numerical base on the data stack, but reads the number from the parser and then adds it to the parse tree:"
{ $subsection POSTPONE: HEX: }
{ $subsection (BASE) }
"Another simple example is the " { $link POSTPONE: \ } " word:"
{ $subsection POSTPONE: \ } ;

ARTICLE: "defining-words" "Defining words"
"Defining words add definitions to the dictionary without modifying the parse tree. The simplest example is the " { $link POSTPONE: SYMBOL: } " word:"
{ $subsection POSTPONE: SYMBOL: }
"The key factor the above definition is " { $link CREATE } ", which reads a token from the input and creates a word with that name. This word is then passed to " { $link define-symbol } "."
{ $subsection CREATE }
"Colon definitions are defined in a more elaborate way. The  definition of " { $link POSTPONE: : } " introduces the next idiom, and that is building a quotation and then adding a definition using " { $link POSTPONE: ; } "."
$terpri
"Recall the colon definition syntax. When the " { $link POSTPONE: : } " word executes, it reads ahead from the input and defines a word. Then, it places a quotation and " { $link f } " on the data stack. The parser builds up a parse tree, and the quotation pushed by " { $link POSTPONE: : } " is called by " { $link POSTPONE: ; } "."
{ $subsection POSTPONE: : }
{ $subsection POSTPONE: ; }
"There are additional parsing words whose syntax is delimited by  " { $link POSTPONE: ; } ", and they are all implemented in the same way -- first they read some input, then they leave a quotation followed by an empty list on the stack."
{ $subsection POSTPONE: C: }
{ $subsection POSTPONE: G: }
{ $subsection POSTPONE: M: }
{ $subsection POSTPONE: PREDICATE: }
{ $subsection POSTPONE: TUPLE: }
{ $subsection POSTPONE: UNION: }
{ $subsection POSTPONE: USING: } ;

ARTICLE: "string-mode" "String mode"
"String mode allows custom parsing of tokenized input. For even more esoteric situations, the input text can be accessed directly."
$terpri
"String mode is controlled by a boolean variable in the parser scope:"
{ $subsection string-mode }
"An illustration of this idiom is found in the " { $link POSTPONE: USING: } " parsing word. It reads a list of vocabularies, terminated by " { $link POSTPONE: ; } ". However, the vocabulary names do not name words, except by coincidence; so string mode is used to read them."
{ $subsection POSTPONE: USING: }
"Make note of the quotation that is left in position for " { $link POSTPONE: ; } " to call. It switches off string mode, so that normal parsing can resume, then adds the given vocabularies to the search path." ;

ARTICLE: "parser-internals" "Parser internals"
"Some variables that encapsulate internal parser state:"
{ $subsection file }
{ $subsection line-number }
{ $subsection line-text }
{ $subsection column }
"A utility used when parsing string literals:"
{ $subsection parse-string } ;