factor/doc/handbook/parser.facts

USING: help io parser words ;

ARTICLE: "parser" "The parser"
"This section concerns itself with reflective access and extension of the Factor parser. The parser algorithm and standard syntax is described in " { $link "syntax" } ". Before the parser proper is documented, we draw attention to a set of words for parsing numbers. They are called by the parser, and are useful in their own right."
$terpri
"The set of words making up the parser are found in the " { $snippet "parser" } " and " { $snippet "syntax" } " vocabularies."
$terpri
"As documented in " { $link "vocabulary-search" } ", the parser looks up words in the vocabulary search path. New word definitions are added to the current vocabulary. These two parameters are stored in a pair of variables:"
{ $subsection use }
{ $subsection in }
"There are two simple ways to call the parser:"
{ $subsection parse }
{ $subsection eval }
"More sophisticated facilities exist, too."
{ $subsection "parse-stream" }
{ $subsection "parsing-words" } ;

ARTICLE: "parse-stream" "Parsing from streams"
"By convention, words for parsing input from streams use a certain default vocabulary search path:"
{ $subsection file-vocabs }
"The central word for parsing input from a stream:"
{ $subsection parse-stream }
"Utilities for working with files:"
{ $subsection parse-file }
{ $subsection run-file }
"Utilities for working with Factor libarary files:"
{ $subsection resource-path }
{ $subsection parse-resource }
{ $subsection run-resource } ;

ARTICLE: "parsing-words" "Parsing words"
"Parsing words execute at parse time, and therefore can access and modify the state of the parser, as well as add objects to the parse tree. Parsing words are a difficult concept to grasp, so this section has several examples and explains the workings of some of the parsing words provided in the library."
$terpri
"Parsing words are marked by suffixing the definition with a declaration:"
{ $subsection POSTPONE: parsing }
{ $subsection "parsing-word-nest" }
{ $subsection "reading-ahead" }
{ $subsection "defining-words" }
{ $subsection "string-mode" } ;

ARTICLE: "parsing-word-nest" "Nested structure"
"The first thing to look at is how the parse tree is built. When parsing begins, the empty list is pushed on the data stack; whenever the parser algorithm appends an object to the parse tree, it conses the object onto the quotation at the top of the stack. This builds the quotation in reverse order, so when parsing is done, the quotation is reversed before it is called."
$terpri
"Lets look at a simple example; the parsing of " { $snippet "1 2 3" } ":"
$terpri
{ $list
    { "Token: " { $snippet "1" } " - stack: " { $snippet "[ 1 ]" } }
    { "Token: " { $snippet "2" } " - stack: " { $snippet "[ 2 1 ]" } }
    { "Token: " { $snippet "3" } " - stack: " { $snippet "[ 3 2 1 ]" } }
}
"Once the end of the string has been reached, the quotation is reversed, leaving the following output:"
{ $code "[ 1 2 3 ]" }
"Nested structure is a bit more involved. The basic idea is that parsing words can push an empty list on the stack, then all subsequent tokens are consed onto this list, until another parsing word adds this list to the list underneath."
$terpri
"The parsing words that delimit the beginning and the end of a quotation illustrate the idiom:"
{ $subsection POSTPONE: [ }
{ $subsection POSTPONE: ] }
"Let us ponder, then, how one particular string will parse::"
{ $snippet "\"1 [ 2 3 ] 4\"" }
{ $list
    { "Token: " { $snippet "1" } " - stack: " { $snippet "[ 1 ]" } }
    { "Token: " { $snippet "[" } " - stack: " { $snippet "[ ] [ 1 ]" } }
    { "Token: " { $snippet "2" } " - stack: " { $snippet "[ 2 ] [ 1 ]" } }
    { "Token: " { $snippet "3" } " - stack: " { $snippet "[ 3 2 ] [ 1 ]" } }
    { "Token: " { $snippet "]" } " - stack: " { $snippet "[ [ 2 3 ] 1 ]" } }
    { "Token: " { $snippet "4" } " - stack: " { $snippet "[ 4 [ 2 3 ] 1 ]" } }
}
"Having done all that, the parser reverses the original quotation, and the expected output is now on the stack:"
{ $code "[ 1 [ 2 3 ] 4 ]" }
"Notice how in the definition of the list parsing words, the final word " { $link POSTPONE: ] } " does all the work. A closely related set of parsing words for reading various other literal types implements another useful idiom."
$terpri
"The word set in question consists of various start delimiters, such a " { $link POSTPONE: { } " for arrays and " { $link POSTPONE: H{ } " for hashtables, together with one end delimiter " { $link POSTPONE: } } ". The start words push a quotation in addition to an empty list; the end word reverses the empty list, and applies the quotation to the newly-parsed list; the quotation converts it to the appropriate type of literal."
{ $subsection POSTPONE: { }
{ $subsection POSTPONE: H{ }
{ $subsection POSTPONE: } } ;

GLOSSARY: "reading ahead" "a parsing word reads ahead of it scans following tokens from the input string" ;

ARTICLE: "reading-ahead" "Reading ahead"
"Parsing words can consume input from the current line to implement various forms of custom syntax."
{ $subsection scan }
{ $subsection scan-word }
"For example, the " { $link POSTPONE: HEX: } " word, for reading hexadecimal literals, uses this facility. It is defined in terms of a lower-level " { $link (BASE) } " word that takes the numerical base on the data stack, but reads the number from the parser and then adds it to the parse tree:"
{ $subsection POSTPONE: HEX: }
{ $subsection (BASE) }
"Another simple example is the " { $link POSTPONE: \ } " word:"
{ $subsection POSTPONE: \ } ;

ARTICLE: "defining-words" "Defining words"
"Defining words add definitions to the dictionary without modifying the parse tree. The simplest example is the " { $link POSTPONE: SYMBOL: } " word:"
{ $subsection POSTPONE: SYMBOL: }
"The key factor the above definition is " { $link CREATE } ", which reads a token from the input and creates a word with that name. This word is then passed to " { $link define-symbol } "."
{ $subsection CREATE }
"Colon definitions are defined in a more elaborate way. The  definition of " { $link POSTPONE: : } " introduces the next idiom, and that is building a quotation and then adding a definition using " { $link POSTPONE: ; } "."
$terpri
"Recall the colon definition syntax. When the " { $link POSTPONE: : } " word executes, it reads ahead from the input and defines a word. Then, it places a quotation and an empty list on the stack. The parser conses tokens onto the empty list, and the quotation is called by " { $link POSTPONE: ; } "."
{ $subsection POSTPONE: : }
{ $subsection POSTPONE: ; }
"There are additional parsing words whose syntax is delimited by  " { $link POSTPONE: ; } ", and they are all implemented in the same way -- first they read some input, then they leave a quotation followed by an empty list on the stack."
{ $subsection POSTPONE: M: }
{ $subsection POSTPONE: C: } ;

ARTICLE: "string-mode" "String mode and parser internals"
"String mode allows custom parsing of tokenized input. For even more esoteric situations, the input text can be accessed directly."
$terpri
"String mode is controlled by a boolean variable in the parser scope:"
{ $subsection string-mode }
"An illustration of this idiom is found in the " { $link POSTPONE: USING: } " parsing word. It reads a list of vocabularies, terminated by " { $link POSTPONE: ; } ". However, the vocabulary names do not name words, except by coincidence; so string mode is used to read them."
{ $subsection POSTPONE: USING: }
"Make note of the quotation that is left in position for " { $link POSTPONE: ; } " to call. It switches off string mode, so that normal parsing can resume, then adds the given vocabularies to the search path."
$terpri
"Some additional variables that encapsulate internal parser state:"
{ $subsection file }
{ $subsection line-number }
{ $subsection line-text }
{ $subsection column }
"Some utilities used when parsing comments:"
{ $subsection until }
{ $subsection until-eol }
"A utility used when parsing string literals:"
{ $subsection parse-string } ;