USING: help io parser sequences words ; ARTICLE: "parser" "The parser" "This section concerns itself with reflective access and extension of the Factor parser. The parser algorithm and standard syntax is described in " { $link "syntax" } "." $nl "The set of words making up the parser are found in the " { $vocab-link "parser" } " and " { $vocab-link "syntax" } " vocabularies." $nl "As documented in " { $link "vocabulary-search" } ", the parser looks up words in the vocabulary search path. New word definitions are added to the current vocabulary. These two parameters are stored in a pair of variables:" { $subsection use } { $subsection in } "There are two simple ways to call the parser which use the vocabulary search path currently in scope, which is usually the default listener search path:" { $subsection parse } { $subsection eval } "The parser can also take input from a sequence of lines, or a stream:" { $subsection parse-lines } { $subsection parse-stream } "Note that the above words are low-level and are not the canonical way to read code from a source file. To learn about loading source files, see " { $link "sources" } "." $nl "The parser can be extended with new parsing word definitions." { $subsection "parsing-words" } "The next level of customization above writing parsing words is extending the lexer:" { $subsection "parser-lexer" } ; ARTICLE: "parsing-words" "Parsing words" "The Factor parser is follows a simple recursive-descent design. The parser reads successive tokens from the input; if the token identifies a number or an ordinary word, it is added to an accumulator vector. Otherwise if the token identifies a parsing word, the parsing word is executed immediately." $nl "Parsing words are marked by suffixing the definition with a " { $link POSTPONE: parsing } " declaration." $nl "Parsing words must have stack effect " { $snippet "( accum -- accum )" } ", where " { $snippet "accum" } " is the accumulator vector supplied by the parser. Parsing words can read input, add word definitions to the dictionary, and do anything an ordinary word can." $nl "Tools for implementing parsing words:" { $subsection "reading-ahead" } { $subsection "parsing-word-nest" } { $subsection "defining-words" } { $subsection "parsing-tokens" } ; ARTICLE: "reading-ahead" "Reading ahead" "Parsing words can consume input:" { $subsection scan } { $subsection scan-word } "For example, the " { $link POSTPONE: HEX: } " word uses this feature to read hexadecimal literals:" { $see POSTPONE: HEX: } "It is defined in terms of a lower-level word that takes the numerical base on the data stack, but reads the number from the parser and then adds it to the parse tree:" { $see parse-base } "Another simple example is the " { $link POSTPONE: \ } " word:" { $see POSTPONE: \ } ; ARTICLE: "parsing-word-nest" "Nested structure" "Recall that the parser loop calls parsing words with an accumulator vector on the stack. The parser loop can be invoked recursively with a new, empty accumulator; the result can then be added to the original accumulator. This is how parsing words for object literals are implemented; object literals can nest arbitrarily deep." $nl "A simple example is the parsing word that reads a quotation:" { $see POSTPONE: [ } "This word uses a utility word which recursively invokes the parser, reading objects into a new accumulator until an occurrence of " { $link POSTPONE: ] } ":" { $subsection parse-literal } "There is another, lower-level word for reading nested structure, which is also useful when called directly:" { $subsection parse-until } "Note that " { $link POSTPONE: ] } " is just a dummy word; declaring it as a " { $link POSTPONE: delimiter } " causes it to throw an error when an unpaired occurrence is encountered:" { $see POSTPONE: ] } { $see-also POSTPONE: { POSTPONE: H{ POSTPONE: V{ POSTPONE: W{ POSTPONE: T{ POSTPONE: } } ; ARTICLE: "defining-words" "Defining words" "Defining words add definitions to the dictionary without modifying the parse tree. The simplest example is the " { $link POSTPONE: SYMBOL: } " word." { $see POSTPONE: SYMBOL: } "The key factor in the definition of " { $link POSTPONE: SYMBOL: } " is " { $link CREATE } ", which reads a token from the input and creates a word with that name. This word is then passed to " { $link define-symbol } "." { $subsection CREATE } "Colon definitions are defined in a more elaborate way:" { $subsection POSTPONE: : } "The " { $subsection POSTPONE: : } " word first calls " { $link CREATE } ", and then reads input until reaching " { $link POSTPONE: ; } " using a utility word:" { $subsection parse-definition } "The " { $link POSTPONE: ; } " word is just a delimiter; an unpaired occurrence throws a parse error:" { $see POSTPONE: ; } "There are additional parsing words whose syntax is delimited by " { $link POSTPONE: ; } ", and they are all implemented by calling " { $link parse-definition } "." { $see-also POSTPONE: C: POSTPONE: G: POSTPONE: M: POSTPONE: PREDICATE: POSTPONE: UNION: } ; ARTICLE: "parsing-tokens" "Parsing raw tokens" "So far we have seen how to read individual tokens, or read a sequence of parsed objects until a delimiter. It is also possible to read raw tokens from the input and perform custom processing." $nl "One example is the " { $link POSTPONE: USING: } " parsing word." { $see POSTPONE: USING: } "It reads a list of vocabularies terminated by " { $link POSTPONE: ; } ". However, the vocabulary names do not name words, except by coincidence; so " { $link parse-until } " cannot be used here. Instead, a lower-level word is called:" { $subsection parse-tokens } ; ARTICLE: "parser-lexer" "The lexer" "Two variables that encapsulate internal parser state:" { $subsection file } { $subsection lexer } "Creating a default lexer:" { $subsection } "A word to test of the end of input has been reached:" { $subsection still-parsing? } "A word to get the text of the current line:" { $subsection line-text } "A word to advance the lexer to the next line:" { $subsection next-line } "Two generic words to override the lexer's token boundary detection:" { $subsection skip-blank } { $subsection skip-word } "A utility used when parsing string literals:" { $subsection parse-string } "The parser can be invoked with a custom lexer:" { $subsection (parse-lines) } { $subsection with-parser } ;