129 lines
7.9 KiB
Plaintext
129 lines
7.9 KiB
Plaintext
USING: help io parser sequences words ;
|
|
|
|
ARTICLE: "parser" "The parser"
|
|
"This section concerns itself with reflective access and extension of the Factor parser. The parser algorithm and standard syntax is described in " { $link "syntax" } "."
|
|
$terpri
|
|
"The set of words making up the parser are found in the " { $snippet "parser" } " and " { $snippet "syntax" } " vocabularies."
|
|
$terpri
|
|
"As documented in " { $link "vocabulary-search" } ", the parser looks up words in the vocabulary search path. New word definitions are added to the current vocabulary. These two parameters are stored in a pair of variables:"
|
|
{ $subsection use }
|
|
{ $subsection in }
|
|
"There are two simple ways to call the parser:"
|
|
{ $subsection parse }
|
|
{ $subsection eval }
|
|
"More sophisticated facilities exist, too."
|
|
{ $subsection "parse-stream" }
|
|
"The parser can be extended with new parsing word definitions."
|
|
{ $subsection "parsing-words" } ;
|
|
|
|
ARTICLE: "parse-stream" "Parsing from streams"
|
|
"By convention, words for parsing input from streams use a certain default vocabulary search path:"
|
|
{ $subsection file-vocabs }
|
|
"The central word for parsing input from a stream:"
|
|
{ $subsection parse-stream }
|
|
"Utilities for working with files:"
|
|
{ $subsection parse-file }
|
|
{ $subsection run-file }
|
|
"Utilities for working with Factor libarary files:"
|
|
{ $subsection resource-path }
|
|
{ $subsection parse-resource }
|
|
{ $subsection run-resource } ;
|
|
|
|
ARTICLE: "parsing-words" "Parsing words"
|
|
"Parsing words execute at parse time, and therefore can access and modify the state of the parser, as well as add objects to the parse tree. Parsing words are a difficult concept to grasp, so this section has several examples and explains the workings of some of the parsing words provided in the library."
|
|
$terpri
|
|
"Parsing words are marked by suffixing the definition with a declaration:"
|
|
{ $subsection POSTPONE: parsing }
|
|
{ $subsection "parsing-word-nest" }
|
|
{ $subsection "reading-ahead" }
|
|
{ $subsection "defining-words" }
|
|
{ $subsection "string-mode" }
|
|
{ $subsection "parser-internals" } ;
|
|
|
|
ARTICLE: "parsing-word-nest" "Nested structure"
|
|
"The first thing to look at is how the parse tree is built. When parsing begins, " { $link f } " is pushed on the data stack; when a new object is parsed, " { $link ?push } " is called to add it to the parse tree. The first call to " { $link ?push } " creates a new one-element vector, and subsequent calls add elements to the end of this vector. When parsing is complete, the vector is converted into a quotation."
|
|
$terpri
|
|
"Lets look at a simple example; the parsing of " { $snippet "1 2 3" } ":"
|
|
{ $table
|
|
{ "Action" "Stack" }
|
|
{ "Initial stack after parsing begins" { $snippet "f" } }
|
|
{ { "Token: " { $snippet "1" } } { $snippet "V{ 1 }" } }
|
|
{ { "Token: " { $snippet "2" } } { $snippet "V{ 1 2 }" } }
|
|
{ { "Token: " { $snippet "3" } } { $snippet "V{ 1 2 3 }" } }
|
|
{ "Final stack upon completion:" { $snippet "[ 1 2 3 ]" } }
|
|
}
|
|
"Nested structure is a bit more involved. The basic idea is that parsing words can push " { $link f } " on the stack to begin a new level of nesting, then all subsequent objects are pushed onto this sequence, until another parsing word adds this sequence to the vector underneath."
|
|
$terpri
|
|
"The parsing words that delimit the beginning and the end of a quotation illustrate the idiom:"
|
|
{ $subsection POSTPONE: [ }
|
|
{ $subsection POSTPONE: ] }
|
|
"Let us ponder, then, how one particular string will parse:"
|
|
{ $snippet "\"1 [ 2 3 ] 4\"" }
|
|
{ $table
|
|
{ "Action" "Stack" }
|
|
{ "Initial stack after parsing begins" { $snippet "f" } }
|
|
{ { "Token: " { $snippet "1" } } { $snippet "V{ 1 }" } }
|
|
{ { "Token: " { $snippet "[" } } { $snippet "V{ 1 } f" } }
|
|
{ { "Token: " { $snippet "2" } } { $snippet "V{ 1 } V{ 2 }" } }
|
|
{ { "Token: " { $snippet "3" } } { $snippet "V{ 1 } V{ 2 3 }" } }
|
|
{ { "Token: " { $snippet "]" } } { $snippet "V{ 1 [ 2 3 ] }" } }
|
|
{ { "Token: " { $snippet "4" } } { $snippet "V{ 1 [ 2 3 ] 4 }" } }
|
|
{ "Final stack upon completion" { $snippet "[ 1 [ 2 3 ] 4 ]" } }
|
|
}
|
|
"Notice how in the definition of the quotation parsing words, the final word " { $link POSTPONE: ] } " does all the work. A closely related set of parsing words for reading various other literal types implements another useful idiom."
|
|
$terpri
|
|
"The word set in question consists of various start delimiters, such a " { $link POSTPONE: { } " for arrays and " { $link POSTPONE: H{ } " for hashtables, together with one end delimiter " { $link POSTPONE: } } ". The start words push a quotation in addition to pushing " { $link f } "; the end word applies the quotation to the newly-parsed vector; the quotation converts it to the appropriate type of literal."
|
|
{ $subsection POSTPONE: { }
|
|
{ $subsection POSTPONE: H{ }
|
|
{ $subsection POSTPONE: V{ }
|
|
{ $subsection POSTPONE: W{ }
|
|
{ $subsection POSTPONE: T{ }
|
|
{ $subsection POSTPONE: } } ;
|
|
|
|
ARTICLE: "reading-ahead" "Reading ahead"
|
|
"Parsing words can consume input from the current line to implement various forms of custom syntax."
|
|
{ $subsection scan }
|
|
{ $subsection scan-word }
|
|
"For example, the " { $link POSTPONE: HEX: } " word, for reading hexadecimal literals, uses this facility. It is defined in terms of a lower-level " { $link (BASE) } " word that takes the numerical base on the data stack, but reads the number from the parser and then adds it to the parse tree:"
|
|
{ $subsection POSTPONE: HEX: }
|
|
{ $subsection (BASE) }
|
|
"Another simple example is the " { $link POSTPONE: \ } " word:"
|
|
{ $subsection POSTPONE: \ } ;
|
|
|
|
ARTICLE: "defining-words" "Defining words"
|
|
"Defining words add definitions to the dictionary without modifying the parse tree. The simplest example is the " { $link POSTPONE: SYMBOL: } " word:"
|
|
{ $subsection POSTPONE: SYMBOL: }
|
|
"The key factor the above definition is " { $link CREATE } ", which reads a token from the input and creates a word with that name. This word is then passed to " { $link define-symbol } "."
|
|
{ $subsection CREATE }
|
|
"Colon definitions are defined in a more elaborate way. The definition of " { $link POSTPONE: : } " introduces the next idiom, and that is building a quotation and then adding a definition using " { $link POSTPONE: ; } "."
|
|
$terpri
|
|
"Recall the colon definition syntax. When the " { $link POSTPONE: : } " word executes, it reads ahead from the input and defines a word. Then, it places a quotation and " { $link f } " on the data stack. The parser builds up a parse tree, and the quotation pushed by " { $link POSTPONE: : } " is called by " { $link POSTPONE: ; } "."
|
|
{ $subsection POSTPONE: : }
|
|
{ $subsection POSTPONE: ; }
|
|
"There are additional parsing words whose syntax is delimited by " { $link POSTPONE: ; } ", and they are all implemented in the same way -- first they read some input, then they leave a quotation followed by an empty list on the stack."
|
|
{ $subsection POSTPONE: C: }
|
|
{ $subsection POSTPONE: G: }
|
|
{ $subsection POSTPONE: M: }
|
|
{ $subsection POSTPONE: PREDICATE: }
|
|
{ $subsection POSTPONE: TUPLE: }
|
|
{ $subsection POSTPONE: UNION: }
|
|
{ $subsection POSTPONE: USING: } ;
|
|
|
|
ARTICLE: "string-mode" "String mode"
|
|
"String mode allows custom parsing of tokenized input. For even more esoteric situations, the input text can be accessed directly."
|
|
$terpri
|
|
"String mode is controlled by a boolean variable in the parser scope:"
|
|
{ $subsection string-mode }
|
|
"An illustration of this idiom is found in the " { $link POSTPONE: USING: } " parsing word. It reads a list of vocabularies, terminated by " { $link POSTPONE: ; } ". However, the vocabulary names do not name words, except by coincidence; so string mode is used to read them."
|
|
{ $subsection POSTPONE: USING: }
|
|
"Make note of the quotation that is left in position for " { $link POSTPONE: ; } " to call. It switches off string mode, so that normal parsing can resume, then adds the given vocabularies to the search path." ;
|
|
|
|
ARTICLE: "parser-internals" "Parser internals"
|
|
"Some variables that encapsulate internal parser state:"
|
|
{ $subsection file }
|
|
{ $subsection line-number }
|
|
{ $subsection line-text }
|
|
{ $subsection column }
|
|
"A utility used when parsing string literals:"
|
|
{ $subsection parse-string } ;
|