Update regexp and tools.deploy docs

db4
Slava Pestov 2009-03-13 23:12:56 -05:00
parent 99de526e8b
commit 46da224f8b
3 changed files with 113 additions and 28 deletions

View File

@ -5,16 +5,32 @@ IN: regexp.combinators
ABOUT: "regexp.combinators"
ARTICLE: "regexp.combinators.intro" "Regular expression combinator rationale"
"Regular expression combinators are useful when part of the regular expression contains user input. For example, given a sequence of strings on the stack, a regular expression which matches any one of them can be constructed:"
{ $code
"[ <literal> ] map <or>"
}
"Without combinators, a naive approach would look as follows:"
{ $code
"\"|\" join <regexp>"
}
"However, this code is incorrect, because one of the strings in the sequence might contain characters which have special meaning inside a regular expression. Combinators avoid this problem by building a regular expression syntax tree directly, without any parsing." ;
ARTICLE: "regexp.combinators" "Regular expression combinators"
"The " { $vocab-link "regexp.combinators" } " vocabulary defines combinators which can be used to build up regular expressions to match strings. This is in addition to the traditional syntax defined in the " { $vocab-link "regexp" } " vocabulary."
"The " { $vocab-link "regexp.combinators" } " vocabulary defines combinators which can be used to build up regular expressions to match strings. This complements the traditional syntax defined in the " { $vocab-link "regexp" } " vocabulary."
{ $subsection "regexp.combinators.intro" }
"Basic combinators:"
{ $subsection <literal> }
{ $subsection <nothing> }
"Higher-order combinators for building new regular expressions from existing ones:"
{ $subsection <or> }
{ $subsection <and> }
{ $subsection <not> }
{ $subsection <sequence> }
{ $subsection <zero-or-more> }
"Derived combinators implemented in terms of the above:"
{ $subsection <one-or-more> }
"Setting options:"
{ $subsection <option> } ;
HELP: <literal>

View File

@ -1,34 +1,70 @@
! Copyright (C) 2008, 2009 Doug Coleman, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel strings help.markup help.syntax math ;
USING: kernel strings help.markup help.syntax math regexp.parser regexp.ast ;
IN: regexp
ABOUT: "regexp"
ARTICLE: "regexp" "Regular expressions"
"The " { $vocab-link "regexp" } " vocabulary provides word for creating and using regular expressions."
{ $subsection { "regexp" "syntax" } }
{ $subsection { "regexp" "construction" } }
{ $vocab-subsection "regexp.combinators" "Regular expression combinators" }
{ $subsection { "regexp" "operations" } }
{ $subsection { "regexp" "intro" } }
"The class of regular expressions:"
{ $subsection regexp }
{ $subsection { "regexp" "theory" } } ;
"Basic usage:"
{ $subsection { "regexp" "syntax" } }
{ $subsection { "regexp" "options" } }
{ $subsection { "regexp" "construction" } }
{ $subsection { "regexp" "operations" } }
"Advanced topics:"
{ $vocab-subsection "Regular expression combinators" "regexp.combinators" }
{ $subsection { "regexp" "theory" } }
{ $subsection { "regexp" "deploy" } } ;
ARTICLE: { "regexp" "intro" } "A quick introduction to regular expressions"
;
ARTICLE: { "regexp" "construction" } "Constructing regular expressions"
"Words which are useful for creating regular expressions:"
"Most of the time, regular expressions are literals and the parsing word should be used, to construct them at parse time. This ensures that they are only compiled once, and gives parse time syntax checking."
{ $subsection POSTPONE: R/ }
"Sometimes, regular expressions need to be constructed at run time instead; for example, in a text editor, the user might input a regular expression to search for in a document."
{ $subsection <regexp> }
{ $subsection <optioned-regexp> }
{ $heading "See also" }
{ $vocab-link "regexp.combinators" } ;
"Another approach is to use " { $vocab-link "regexp.combinators" } "." ;
ARTICLE: { "regexp" "syntax" } "Regular expression syntax"
"Regexp syntax is largely compatible with Perl, Java and extended POSIX regexps, but not completely." $nl
"A new addition is the inclusion of a negation operator, with the syntax " { $snippet "(?~foo)" } " to match everything that does not match " { $snippet "foo" } "." $nl
"Regexp syntax is largely compatible with Perl, Java and extended POSIX regexps, but not completely. A new addition is the inclusion of a negation operator, with the syntax " { $snippet "(?~foo)" } " to match everything that does not match " { $snippet "foo" } "."
{ $heading "Characters" }
{ $heading "Character classes" }
{ $heading "Predefined character classes" }
{ $heading "Boundaries" }
{ $heading "Greedy quantifiers" }
{ $heading "Reluctant quantifiers" }
{ $heading "Posessive quantifiers" }
{ $heading "Logical operations" }
{ $heading "Lookaround" }
{ $heading "Unsupported features" }
"One missing feature is backreferences. This is because of a design decision to allow only regular expressions following the formal theory of regular languages. For more information, see " { $link { "regexp" "theory" } } ". You can create a new regular expression to match a particular string using " { $vocab-link "regexp.combinators" } " and group capture is available to extract parts of a regular expression match." $nl
"A distinction from Perl is that " { $snippet "\\G" } ", which references the previous match, is not included. This is because that sequence is inherently stateful, and Factor regexps don't hold state." $nl
"Another feature is Perl's " { $snippet "\\G" } " syntax, which references the previous match, is not included. This is because that sequence is inherently stateful, and Factor regexps don't hold state." $nl
"Additionally, none of the operations which embed code into a regexp are supported, as this would require the inclusion of the Factor parser and compiler in any application which wants to expose regexps to the user. None of the casing operations are included, for simplicity." ; ! Also describe syntax, from the beginning
ARTICLE: { "regexp" "options" } "Regular expression options"
"When " { $link { "regexp" "construction" } } ", various options can be provided. Options have single-character names. A string of options has one of the following two forms:"
{ $code "on" "on-off" }
"The latter syntax allows some options to be disabled. The " { $snippet "on" } " and " { $snippet "off" } " strings name options to be enabled and disabled, respectively."
$nl
"The following options are supported:"
{ $table
{ "i" { $link case-insensitive } }
{ "d" { $link unix-lines } }
{ "m" { $link multiline } }
{ "n" { $link multiline } }
{ "r" { $link reversed-regexp } }
{ "s" { $link dotall } }
{ "u" { $link unicode-case } }
{ "x" { $link comments } }
} ;
ARTICLE: { "regexp" "theory" } "The theory of regular expressions"
"Far from being just a practical tool invented by Unix hackers, regular expressions were studied formally before computer programs were written to process them." $nl
"A regular language is a set of strings that is matched by a regular expression, which is defined to have characters and the empty string, along with the operations concatenation, disjunction and Kleene star. Another way to define the class of regular languages is as the class of languages which can be recognized with constant space overhead, ie with a DFA. These two definitions are provably equivalent." $nl
@ -39,26 +75,41 @@ ARTICLE: { "regexp" "theory" } "The theory of regular expressions"
"The Factor regular expression engine was built with the design decision to support negation and intersection at the expense of backreferences. This lets us have a guaranteed linear-time matching algorithm. Systems like Ragel and Lex also use this algorithm, but in the Factor regular expression engine, all other features of regexps are still present." ;
ARTICLE: { "regexp" "operations" } "Matching operations with regular expressions"
"Testing if a string matches a regular expression:"
{ $subsection matches? }
"Finding a match inside a string:"
{ $subsection re-contains? }
{ $subsection first-match }
"Finding all matches inside a string:"
{ $subsection count-matches }
{ $subsection all-matching-slices }
{ $subsection all-matching-subseqs }
"Splitting a string into tokens delimited by a regular expression:"
{ $subsection re-split }
{ $subsection re-replace }
{ $subsection count-matches } ;
"Replacing occurrences of a regular expression with a string:"
{ $subsection re-replace } ;
ARTICLE: { "regexp" "deploy" } "Regular expressions and the deploy tool"
"The " { $link "tools.deploy" } " tool has the option to strip out the optimizing compiler from the resulting image. Since regular expressions compile to Factor code, this creates a minor performance-related caveat."
$nl
"Regular expressions constructed at runtime from a deployed application will be compiled with the non-optimizing compiler, which is always available because it is built into the Factor VM. This will result in lower performance than when using the optimizing compiler."
$nl
"Literal regular expressions constructed at parse time do not suffer from this restriction, since the deployed application is loaded and compiled before anything is stripped out."
$nl
"None of this applies to deployed applications which include the optimizing compiler, or code running inside a development image."
{ $see-also "compiler" { "regexp" "construction" } "deploy-flags" } ;
HELP: <regexp>
{ $values { "string" string } { "regexp" regexp } }
{ $description "Creates a regular expression object, given a string in regular expression syntax. When it is first used for matching, a DFA is compiled, and this DFA is stored for reuse so it is only compiled once." } ;
HELP: <optioned-regexp>
{ $values { "string" string } { "options" string } { "regexp" regexp } }
{ $values { "string" string } { "options" "a string of " { $link { "regexp" "options" } } } { "regexp" regexp } }
{ $description "Given a string in regular expression syntax, and a string of options, creates a regular expression object. When it is first used for matching, a DFA is compiled, and this DFA is stored for reuse so it is only compiled once." } ;
HELP: R/
{ $syntax "R/ foo.*|[a-zA-Z]bar/i" }
{ $description "Literal syntax for a regular expression. When this syntax is used, the DFA is compiled at compile-time, rather than on first use." } ;
{ $syntax "R/ foo.*|[a-zA-Z]bar/options" }
{ $description "Literal syntax for a regular expression. When this syntax is used, the DFA is compiled at compile-time, rather than on first use. The syntax for the " { $snippet "options" } " string is documented in " { $link { "regexp" "options" } } "." } ;
HELP: regexp
{ $class-description "The class of regular expressions. To construct these, see " { $link { "regexp" "construction" } } "." } ;

View File

@ -1,5 +1,5 @@
USING: help.markup help.syntax words alien.c-types assocs
kernel ;
kernel call call.private tools.deploy.config ;
IN: tools.deploy
ARTICLE: "prepare-deploy" "Preparing to deploy an application"
@ -7,25 +7,43 @@ ARTICLE: "prepare-deploy" "Preparing to deploy an application"
{ $subsection "deploy-config" }
{ $subsection "deploy-flags" } ;
ARTICLE: "tools.deploy" "Application deployment"
"The stand-alone application deployment tool compiles a vocabulary down to a native executable which runs the vocabulary's " { $link POSTPONE: MAIN: } " hook. Deployed executables do not depend on Factor being installed, and do not expose any source code, and thus are suitable for delivering commercial end-user applications."
$nl
"For example, we can deploy the " { $vocab-link "hello-world" } " demo which comes with Factor:"
ARTICLE: "tools.deploy.usage" "Deploy tool usage"
"Once the necessary deployment flags have been set, the application can be deployed:"
{ $subsection deploy }
"For example, you can deploy the " { $vocab-link "hello-ui" } " demo which comes with Factor. Note that this demo already has a deployment configuration, so nothing needs to be configured:"
{ $code "\"hello-ui\" deploy" }
{ $list
{ "On Mac OS X, this yields a program named " { $snippet "Hello world.app" } "." }
{ "On Windows, it yields a directory named " { $snippet "Hello world" } " containing a program named " { $snippet "hello-ui.exe" } "." }
{ "On Unix-like systems (Linux, BSD, Solaris, etc), it yields a directory named " { $snippet "Hello world" } " containing a program named " { $snippet "hello-ui" } "." }
}
"In all cases, running the program displays a window with a message."
$nl
"On all platforms, running the program will display a window with a message." ;
ARTICLE: "tools.deploy.impl" "Deploy tool implementation"
"The deployment tool works by bootstrapping a fresh image, loading the vocabulary into this image, then applying various heuristics to strip the image down to minimal size."
$nl
"The deploy tool generates " { $emphasis "staging images" } " containing major subsystems, and uses the staging images to derive the final application image. The first time an application is deployed using a major subsystem, such as the UI, a new staging image is made, which can take a few minutes. Subsequent deployments of applications using this subsystem will be much faster." ;
ARTICLE: "tools.deploy.caveats" "Deploy tool caveats"
{ $heading "Behavior of " { $link boa } }
"In deployed applications, the " { $link boa } " word does not verify that the parameters on the stack satisfy the tuple's slot declarations, if any. This reduces deploy image size but can make bugs harder to track down. Make sure your program is fully debugged before deployment."
{ $heading "Behavior of " { $link POSTPONE: execute( } }
"Similarly, the " { $link POSTPONE: execute( } " word does not check word stack effects in deployed applications, since stack effects are stripped out, and so it behaves exactly like " { $link POSTPONE: execute-unsafe( } "."
{ $heading "Error reporting" }
"If the " { $link deploy-reflection } " level in the configuration is low enough, the debugger is stripped out, and error messages can be rather cryptic. Increase the reflection level to get readable error messages."
{ $heading "Choosing the right deploy flags" }
"Finding the correct deploy flags is a trial and error process; you must find a tradeoff between deployed image size and correctness. If your program uses dynamic language features, you may need to elect to strip out fewer subsystems in order to have full functionality." ;
ARTICLE: "tools.deploy" "Application deployment"
"The stand-alone application deployment tool, implemented in the " { $vocab-link "tools.deploy" } " vocablary, compiles a vocabulary down to a native executable which runs the vocabulary's " { $link POSTPONE: MAIN: } " hook. Deployed executables do not depend on Factor being installed, and do not expose any source code, and thus are suitable for delivering commercial end-user applications."
$nl
"Most of the time, the words in the " { $vocab-link "tools.deploy" } " vocabulary should not be used directly; instead, use " { $link "ui.tools.deploy" } "."
$nl
"You must explicitly specify major subsystems which are required, as well as the level of reflection support needed. This is done by modifying the deployment configuration prior to deployment."
{ $subsection "prepare-deploy" }
"Once the necessary deployment flags have been set, the application can be deployed:"
{ $subsection deploy }
{ $see-also "ui.tools.deploy" } ;
{ $subsection "tools.deploy.usage" }
{ $subsection "tools.deploy.impl" }
{ $subsection "tools.deploy.caveats" } ;
ABOUT: "tools.deploy"