Merge commit 'origin/master' into emacs

db4
Jose A. Ortega Ruiz 2009-03-13 23:12:35 +01:00
commit 121b3aa382
198 changed files with 3311 additions and 2178 deletions

View File

@ -24,7 +24,7 @@ The Factor runtime is written in GNU C99, and is built with GNU make and
gcc.
Factor supports various platforms. For an up-to-date list, see
<http://factorcode.org/getfactor.fhtml>.
<http://factorcode.org>.
Factor requires gcc 3.4 or later.
@ -36,17 +36,6 @@ arguments for make.
Run 'make' ('gmake' on *BSD) with no parameters to build the Factor VM.
Compilation will yield an executable named 'factor' on Unix,
'factor.exe' on Windows XP/Vista, and 'factor-ce.exe' on Windows CE.
* Libraries needed for compilation
For X11 support, you need recent development libraries for libc,
Freetype, X11, OpenGL and GLUT. On a Debian-derived Linux distribution
(like Ubuntu), you can use the following line to grab everything:
sudo apt-get install libc6-dev libfreetype6-dev libx11-dev glutg3-dev
* Bootstrapping the Factor image
Once you have compiled the Factor runtime, you must bootstrap the Factor
@ -69,6 +58,12 @@ machines.
On Unix, Factor can either run a graphical user interface using X11, or
a terminal listener.
For X11 support, you need recent development libraries for libc,
Pango, X11, OpenGL and GLUT. On a Debian-derived Linux distribution
(like Ubuntu), you can use the following line to grab everything:
sudo apt-get install libc6-dev libpango-1.0-dev libx11-dev glutg3-dev
If your DISPLAY environment variable is set, the UI will start
automatically:
@ -78,14 +73,6 @@ To run an interactive terminal listener:
./factor -run=listener
If you're inside a terminal session, you can start the UI with one of
the following two commands:
ui
[ ui ] in-thread
The latter keeps the terminal listener running.
* Running Factor on Mac OS X - Cocoa UI
On Mac OS X, a Cocoa UI is available in addition to the terminal
@ -110,7 +97,7 @@ When compiling Factor, pass the X11=1 parameter:
Then bootstrap with the following switches:
./factor -i=boot.<cpu>.image -ui-backend=x11
./factor -i=boot.<cpu>.image -ui-backend=x11 -ui-text-backend=pango
Now if $DISPLAY is set, running ./factor will start the UI.
@ -126,6 +113,12 @@ the command prompt using the console application:
factor.com -i=boot.<cpu>.image
Before bootstrapping, you will need to download the DLLs for the Pango
text rendering library. The required DLLs are listed in
build-support/dlls.txt and are available from the following location:
<http://factorcode.org/dlls>
Once bootstrapped, double-clicking factor.exe or factor.com starts
the Factor UI.
@ -135,7 +128,9 @@ To run the listener in the command prompt:
* The Factor FAQ
The Factor FAQ is available at <http://factorcode.org/faq.fhtml>.
The Factor FAQ is available at the following location:
<http://concatenative.org/wiki/view/Factor/FAQ>
* Command line usage

View File

@ -217,6 +217,8 @@ $nl
"Utilities for automatically freeing memory in conjunction with " { $link with-destructors } ":"
{ $subsection &free }
{ $subsection |free }
"The " { $link &free } " and " { $link |free } " words are generated using " { $link "alien.destructors" } "."
$nl
"You can unsafely copy a range of bytes from one memory location to another:"
{ $subsection memcpy }
"You can copy a range of bytes from memory into a byte array:"
@ -243,4 +245,6 @@ $nl
"New C types can be defined:"
{ $subsection "c-structs" }
{ $subsection "c-unions" }
"A utility for defining " { $link "destructors" } " for deallocating memory:"
{ $subsection "alien.destructors" }
{ $see-also "aliens" } ;

View File

@ -0,0 +1,30 @@
IN: alien.destructors
USING: help.markup help.syntax alien destructors ;
HELP: DESTRUCTOR:
{ $syntax "DESTRUCTOR: word" }
{ $description "Defines four things:"
{ $list
{ "a tuple named " { $snippet "word" } " with a single slot holding a " { $link c-ptr } }
{ "a " { $link dispose } " method on the tuple which calls " { $snippet "word" } " with the " { $link c-ptr } }
{ "a pair of words, " { $snippet "&word" } " and " { $snippet "|word" } ", which call " { $link &dispose } " and " { $link |dispose } " with a new instance of the tuple" }
}
"The " { $snippet "word" } " must be defined in the current vocabulary, and must have stack effect " { $snippet "( c-ptr -- )" } "."
}
{ $examples
"Suppose you are writing a binding to the GLib library, which as a " { $snippet "g_object_unref" } " function. Then you can define the function and destructor like so,"
{ $code
"FUNCTION: void g_object_unref ( gpointer object ) ;"
"DESTRUCTOR: g_object_unref"
}
"Now, memory management becomes easier:"
{ $code
"[ g_new_foo &g_object_unref ... ] with-destructors"
}
} ;
ARTICLE: "alien.destructors" "Alien destructors"
"The " { $vocab-link "alien.destructors" } " vocabulary defines a utility parsing word for defining new disposable classes."
{ $subsection POSTPONE: DESTRUCTOR: } ;
ABOUT: "alien.destructors"

View File

@ -10,7 +10,7 @@ IN: ascii
: LETTER? ( ch -- ? ) CHAR: A CHAR: Z between? ; inline
: digit? ( ch -- ? ) CHAR: 0 CHAR: 9 between? ; inline
: printable? ( ch -- ? ) CHAR: \s CHAR: ~ between? ; inline
: control? ( ch -- ? ) "\0\e\r\n\t\u000008\u00007f" member? ; inline
: control? ( ch -- ? ) { [ 0 HEX: 1F between? ] [ HEX: 7F = ] } 1|| ; inline
: quotable? ( ch -- ? ) { [ printable? ] [ "\"\\" member? not ] } 1&& ; inline
: Letter? ( ch -- ? ) { [ letter? ] [ LETTER? ] } 1|| ; inline
: alpha? ( ch -- ? ) { [ Letter? ] [ digit? ] } 1|| ; inline
@ -20,4 +20,4 @@ IN: ascii
: >upper ( str -- upper ) [ ch>upper ] map ;
HINTS: >lower string ;
HINTS: >upper string ;
HINTS: >upper string ;

View File

@ -1,4 +1,4 @@
! Copyright (C) 2008 Slava Pestov.
! Copyright (C) 2008, 2009 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel sequences sequences.private accessors math
math.order combinators hints arrays ;
@ -16,14 +16,19 @@ IN: binary-search
[ [ from>> ] [ midpoint@ ] bi + ] [ seq>> ] bi
[ drop ] [ dup ] [ ] tri* nth ; inline
DEFER: (search)
: keep-searching ( seq quot -- slice )
[ dup midpoint@ ] dip call collapse-slice slice boa (search) ; inline
: (search) ( quot: ( elt -- <=> ) seq -- i elt )
dup length 1 <= [
finish
] [
decide {
{ +eq+ [ finish ] }
{ +lt+ [ dup midpoint@ head-slice (search) ] }
{ +gt+ [ dup midpoint@ tail-slice (search) ] }
{ +lt+ [ [ (head) ] keep-searching ] }
{ +gt+ [ [ (tail) ] keep-searching ] }
} case
] if ; inline recursive

View File

@ -14,12 +14,20 @@ IN: call.tests
[ 1 2 \ + execute( x y -- z a ) ] must-fail
[ \ + execute( x y -- z ) ] must-infer
: compile-execute(-test-1 ( a b -- c ) \ + execute( a b -- c ) ;
[ t ] [ \ compile-execute(-test-1 optimized>> ] unit-test
[ 4 ] [ 1 3 compile-execute(-test-1 ] unit-test
: compile-execute(-test-2 ( a b w -- c ) execute( a b -- c ) ;
[ t ] [ \ compile-execute(-test-2 optimized>> ] unit-test
[ 4 ] [ 1 3 \ + compile-execute(-test-2 ] unit-test
[ 5 ] [ 1 4 \ + compile-execute(-test-2 ] unit-test
[ -3 ] [ 1 4 \ - compile-execute(-test-2 ] unit-test
[ 5 ] [ 1 4 \ + compile-execute(-test-2 ] unit-test
[ t ] [ \ + (( a b -- c )) execute-effect-unsafe? ] unit-test
[ t ] [ \ + (( a b c -- d e )) execute-effect-unsafe? ] unit-test
[ f ] [ \ + (( a b c -- d )) execute-effect-unsafe? ] unit-test
[ f ] [ \ call (( x -- )) execute-effect-unsafe? ] unit-test
: compile-execute(-test ( a b -- c ) \ + execute( a b -- c ) ;
[ t ] [ \ compile-execute(-test optimized>> ] unit-test
[ 4 ] [ 1 3 compile-execute(-test ] unit-test

View File

@ -1,7 +1,8 @@
! Copyright (C) 2009 Daniel Ehrenberg.
! Copyright (C) 2009 Daniel Ehrenberg, Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel macros fry summary sequences generalizations accessors
continuations effects effects.parser parser words ;
USING: kernel macros fry summary sequences sequences.private
generalizations accessors continuations effects effects.parser
parser words ;
IN: call
ERROR: wrong-values values quot length-required ;
@ -14,17 +15,9 @@ M: wrong-values summary
: firstn-safe ( array quot n -- ... )
3dup nip swap length = [ nip firstn ] [ wrong-values ] if ; inline
: execute-effect-unsafe ( word effect -- )
drop execute ;
: execute-effect-unsafe? ( word effect -- ? )
swap dup optimized>> [ stack-effect swap effect<= ] [ 2drop f ] if ; inline
: parse-call( ( accum word -- accum )
[ ")" parse-effect parsed ] dip parsed ;
: execute-unsafe( \ execute-effect-unsafe parse-call( ; parsing
PRIVATE>
MACRO: call-effect ( effect -- quot )
@ -33,10 +26,35 @@ MACRO: call-effect ( effect -- quot )
: call( \ call-effect parse-call( ; parsing
: execute-effect ( word effect -- )
2dup execute-effect-unsafe?
[ execute-effect-unsafe ]
[ [ [ execute ] curry ] dip call-effect ]
if ; inline
<PRIVATE
: execute-effect-unsafe ( word effect -- )
drop execute ;
: execute-unsafe( \ execute-effect-unsafe parse-call( ; parsing
: execute-effect-slow ( word effect -- )
[ [ execute ] curry ] dip call-effect ; inline
: cache-hit? ( word ic -- ? ) first-unsafe eq? ; inline
: cache-hit ( word effect ic -- ) drop execute-effect-unsafe ; inline
: execute-effect-unsafe? ( word effect -- ? )
over optimized>> [ [ stack-effect ] dip effect<= ] [ 2drop f ] if ; inline
: cache-miss ( word effect ic -- )
[ 2dup execute-effect-unsafe? ] dip
'[ [ drop _ set-first ] [ execute-effect-unsafe ] 2bi ]
[ execute-effect-slow ] if ; inline
: execute-effect-ic ( word effect ic -- )
#! ic is a mutable cell { effect }
3dup nip cache-hit? [ cache-hit ] [ cache-miss ] if ; inline
PRIVATE>
MACRO: execute-effect ( effect -- )
{ f } clone '[ _ _ execute-effect-ic ] ;
: execute( \ execute-effect parse-call( ; parsing

View File

@ -7,4 +7,34 @@ assocs cocoa.enumeration ;
[ V{ } ] [ H{ } >cf &CFRelease [ ] NSFastEnumeration-map ] unit-test
[ V{ "A" } ] [ { "A" } >cf &CFRelease plist> ] unit-test
[ H{ { "A" "B" } } ] [ "B" "A" associate >cf &CFRelease plist> ] unit-test
[ H{ { "A" "B" } } ] [ "B" "A" associate >cf &CFRelease plist> ] unit-test
[ t ] [
{
H{ { "DeviceUsagePage" 1 } { "DeviceUsage" 4 } }
H{ { "DeviceUsagePage" 1 } { "DeviceUsage" 5 } }
H{ { "DeviceUsagePage" 1 } { "DeviceUsage" 6 } }
} [ >cf &CFRelease ] [ >cf &CFRelease ] bi
[ plist> ] bi@ =
] unit-test
[ t ] [
{ "DeviceUsagePage" 1 }
[ >cf &CFRelease ] [ >cf &CFRelease ] bi
[ plist> ] bi@ =
] unit-test
[ V{ "DeviceUsagePage" "Yes" } ] [
{ "DeviceUsagePage" "Yes" }
>cf &CFRelease plist>
] unit-test
[ V{ 2.0 1.0 } ] [
{ 2.0 1.0 }
>cf &CFRelease plist>
] unit-test
[ 3.5 ] [
3.5 >cf &CFRelease plist>
] unit-test
] with-destructors

View File

@ -2,7 +2,7 @@ IN: colors.constants
USING: help.markup help.syntax strings colors ;
HELP: named-color
{ $values { "string" string } { "color" color } }
{ $values { "name" string } { "color" color } }
{ $description "Outputs a named color from the " { $snippet "rgb.txt" } " database." }
{ $notes "In most cases, " { $link POSTPONE: COLOR: } " should be used instead." }
{ $errors "Throws an error if the color is not listed in " { $snippet "rgb.txt" } "." } ;

View File

@ -27,7 +27,7 @@ PRIVATE>
ERROR: no-such-color name ;
: named-color ( name -- rgb )
: named-color ( name -- color )
dup rgb.txt at [ ] [ no-such-color ] ?if ;
: COLOR: scan named-color parsed ; parsing

View File

@ -1,15 +1,14 @@
! Copyright (C) 2004, 2009 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors kernel namespaces arrays sequences io words fry
continuations vocabs assocs dlists definitions math graphs
generic combinators deques search-deques io stack-checker
stack-checker.state stack-checker.inlining
combinators.short-circuit compiler.errors compiler.units
compiler.tree.builder compiler.tree.optimizer
compiler.cfg.builder compiler.cfg.optimizer
continuations vocabs assocs dlists definitions math graphs generic
combinators deques search-deques macros io stack-checker
stack-checker.state stack-checker.inlining combinators.short-circuit
compiler.errors compiler.units compiler.tree.builder
compiler.tree.optimizer compiler.cfg.builder compiler.cfg.optimizer
compiler.cfg.linearization compiler.cfg.two-operand
compiler.cfg.linear-scan compiler.cfg.stack-frame
compiler.codegen compiler.utilities ;
compiler.cfg.linear-scan compiler.cfg.stack-frame compiler.codegen
compiler.utilities ;
IN: compiler
SYMBOL: compile-queue
@ -50,8 +49,12 @@ SYMBOLS: +optimized+ +unoptimized+ ;
H{ } clone generic-dependencies set
f swap compiler-error ;
: ignore-error? ( word error -- ? )
[ [ inline? ] [ macro? ] bi or ]
[ compiler-error-type +warning+ eq? ] bi* and ;
: fail ( word error -- * )
[ swap compiler-error ]
[ 2dup ignore-error? [ 2drop ] [ swap compiler-error ] if ]
[
drop
[ compiled-unxref ]

View File

@ -514,4 +514,9 @@ cell-bits 32 = [
[ t ] [
[ { fixnum fixnum } declare = ]
\ both-fixnums? inlined?
] unit-test
[ t ] [
[ { integer integer } declare + drop ]
{ + +-integer-integer } inlined?
] unit-test

View File

@ -46,9 +46,6 @@ M: predicate finalize-word
[ drop ]
} cond ;
! M: math-partial finalize-word
! dup primitive? [ drop ] [ nip cached-expansion ] if ;
M: word finalize-word drop ;
M: #call finalize*

View File

@ -238,7 +238,7 @@ DEFER: (value-info-union)
: value-infos-union ( infos -- info )
[ null-info ]
[ dup first [ value-info-union ] reduce ] if-empty ;
[ unclip-slice [ value-info-union ] reduce ] if-empty ;
: literals<= ( info1 info2 -- ? )
{

View File

@ -655,3 +655,36 @@ MIXIN: empty-mixin
! [ t ] [ [ dup t xor and ] final-classes first false-class? ] unit-test
! [ t ] [ [ dup t xor swap and ] final-classes first false-class? ] unit-test
! generalize-counter-interval wasn't being called in all the right places.
! bug found by littledan
TUPLE: littledan-1 { a read-only } ;
: (littledan-1-test) ( a -- ) a>> 1+ littledan-1 boa (littledan-1-test) ; inline recursive
: littledan-1-test ( -- ) 0 littledan-1 boa (littledan-1-test) ; inline
[ ] [ [ littledan-1-test ] final-classes drop ] unit-test
TUPLE: littledan-2 { from read-only } { to read-only } ;
: (littledan-2-test) ( x -- i elt )
[ from>> ] [ to>> ] bi + dup littledan-2 boa (littledan-2-test) ; inline recursive
: littledan-2-test ( x -- i elt )
[ 0 ] dip { array-capacity } declare littledan-2 boa (littledan-2-test) ; inline
[ ] [ [ littledan-2-test ] final-classes drop ] unit-test
: (littledan-3-test) ( x -- )
length 1+ f <array> (littledan-3-test) ; inline recursive
: littledan-3-test ( x -- )
0 f <array> (littledan-3-test) ; inline
[ ] [ [ littledan-3-test ] final-classes drop ] unit-test
[ V{ 0 } ] [ [ { } length ] final-literals ] unit-test
[ V{ 1 } ] [ [ { } length 1+ f <array> length ] final-literals ] unit-test

View File

@ -34,9 +34,14 @@ IN: compiler.tree.propagation.recursive
} cond interval-union nip ;
: generalize-counter ( info' initial -- info )
2dup [ class>> null-class? ] either? [ drop ] [
[ drop clone ] [ [ interval>> ] bi@ ] 2bi
generalize-counter-interval >>interval
2dup [ not ] either? [ drop ] [
2dup [ class>> null-class? ] either? [ drop ] [
[ clone ] dip
[ [ drop ] [ [ interval>> ] bi@ generalize-counter-interval ] 2bi >>interval ]
[ [ drop ] [ [ slots>> ] bi@ [ generalize-counter ] 2map ] 2bi >>slots ]
[ [ drop ] [ [ length>> ] bi@ generalize-counter ] 2bi >>length ]
tri
] if
] if ;
: unify-recursive-stacks ( stacks initial -- infos )

View File

@ -1,6 +1,6 @@
! Copyright (C) 2008 Slava Pestov.
! Copyright (C) 2008, 2009 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: alien.syntax kernel sequences ;
USING: alien.syntax kernel sequences fry ;
IN: core-foundation.arrays
TYPEDEF: void* CFArrayRef
@ -17,6 +17,5 @@ FUNCTION: CFIndex CFArrayGetCount ( CFArrayRef array ) ;
dup CFArrayGetCount [ CFArrayGetValueAtIndex ] with map ;
: <CFArray> ( seq -- alien )
[ f swap length f CFArrayCreateMutable ] keep
[ length ] keep
[ [ dupd ] dip CFArraySetValueAtIndex ] 2each ;
f over length &: kCFTypeArrayCallBacks CFArrayCreateMutable
[ '[ [ _ ] 2dip swap CFArraySetValueAtIndex ] each-index ] keep ;

View File

@ -0,0 +1,2 @@
unportable
bindings

View File

@ -0,0 +1,2 @@
unportable
bindings

View File

@ -220,7 +220,7 @@ M: assert error.
5 line-limit set
[ expect>> [ [ "Expect:" write ] with-cell pprint-cell ] with-row ]
[ got>> [ [ "Got:" write ] with-cell pprint-cell ] with-row ] bi
] tabular-output ;
] tabular-output nl ;
M: immutable summary drop "Sequence is immutable" ;

View File

@ -13,8 +13,8 @@ HELP: PROTOCOL:
{ define-protocol POSTPONE: PROTOCOL: } related-words
HELP: define-consult
{ $values { "class" "a class" } { "group" "a protocol, generic word or tuple class" } { "quot" "a quotation" } }
{ $description "Defines a class to consult, using the given quotation, on the generic words contained in the group." }
{ $values { "consultation" consultation } }
{ $description "Defines a class to consult, using the quotation, on the generic words contained in the group." }
{ $notes "Usually, " { $link POSTPONE: CONSULT: } " should be used instead. This is only for runtime use." } ;
HELP: CONSULT:

View File

@ -1,7 +1,7 @@
USING: delegate kernel arrays tools.test words math definitions
compiler.units parser generic prettyprint io.streams.string
accessors eval multiline generic.standard delegate.protocols
delegate.private assocs ;
delegate.private assocs see ;
IN: delegate.tests
TUPLE: hello this that ;

View File

@ -99,6 +99,7 @@ link-no-follow? off
[ "<p><img src=\"lol.jpg\" alt=\"teh lol\"/></p>" ] [ "[[image:lol.jpg|teh lol]]" convert-farkup ] unit-test
[ "<p><a href=\"http://lol.com\">http://lol.com</a></p>" ] [ "[[http://lol.com]]" convert-farkup ] unit-test
[ "<p><a href=\"http://lol.com\">haha</a></p>" ] [ "[[http://lol.com|haha]]" convert-farkup ] unit-test
[ "<p><a href=\"http://lol.com/search?q=sex\">haha</a></p>" ] [ "[[http://lol.com/search?q=sex|haha]]" convert-farkup ] unit-test
[ "<p><a href=\"Foo/Bar\">Bar</a></p>" ] [ "[[Foo/Bar]]" convert-farkup ] unit-test
"/wiki/view/" relative-link-prefix [

View File

@ -165,12 +165,12 @@ CONSTANT: invalid-url "javascript:alert('Invalid URL in farkup');"
{ [ dup [ 127 > ] any? ] [ drop invalid-url ] }
{ [ dup first "/\\" member? ] [ drop invalid-url ] }
{ [ CHAR: : over member? ] [ dup absolute-url? [ drop invalid-url ] unless ] }
[ relative-link-prefix get prepend "" like ]
} cond url-encode ;
[ relative-link-prefix get prepend "" like url-encode ]
} cond ;
: write-link ( href text -- xml )
[ check-url link-no-follow? get "true" and ] dip
[XML <a href=<-> nofollow=<->><-></a> XML] ;
[ check-url link-no-follow? get "nofollow" and ] dip
[XML <a href=<-> rel=<->><-></a> XML] ;
: write-image-link ( href text -- xml )
disable-images? get [

View File

@ -14,5 +14,6 @@ USING: tools.test globs ;
[ f ] [ "foo.java" "*.{xml,txt}" glob-matches? ] unit-test
[ t ] [ "foo.txt" "*.{xml,txt}" glob-matches? ] unit-test
[ t ] [ "foo.xml" "*.{xml,txt}" glob-matches? ] unit-test
[ f ] [ "foo." "*.{,xml,txt}" glob-matches? ] unit-test
[ f ] [ "foo." "*.{xml,txt}" glob-matches? ] unit-test
[ t ] [ "foo." "*.{,xml,txt}" glob-matches? ] unit-test
[ t ] [ "foo.{" "*.{" glob-matches? ] unit-test

View File

@ -1,42 +1,42 @@
! Copyright (C) 2007 Slava Pestov.
! Copyright (C) 2007, 2009 Slava Pestov, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: parser-combinators parser-combinators.regexp lists sequences kernel
promises strings unicode.case ;
USING: sequences kernel regexp.combinators strings unicode.case
peg.ebnf regexp arrays ;
IN: globs
<PRIVATE
EBNF: <glob>
: 'char' ( -- parser )
[ ",*?" member? not ] satisfy ;
Character = "\\" .:c => [[ c 1string <literal> ]]
| !(","|"}") . => [[ 1string <literal> ]]
: 'string' ( -- parser )
'char' <+> [ >lower token ] <@ ;
RangeCharacter = !("]") .
: 'escaped-char' ( -- parser )
"\\" token any-char-parser &> [ 1token ] <@ ;
Range = RangeCharacter:a "-" RangeCharacter:b => [[ a b <char-range> ]]
| RangeCharacter => [[ 1string <literal> ]]
: 'escaped-string' ( -- parser )
'string' 'escaped-char' <|> ;
StartRange = .:a "-" RangeCharacter:b => [[ a b <char-range> ]]
| . => [[ 1string <literal> ]]
DEFER: 'term'
Ranges = StartRange:s Range*:r => [[ r s prefix ]]
: 'glob' ( -- parser )
'term' <*> [ <and-parser> ] <@ ;
CharClass = "^"?:n Ranges:e => [[ e <or> n [ <not> ] when ]]
: 'union' ( -- parser )
'glob' "," token nonempty-list-of "{" "}" surrounded-by
[ <or-parser> ] <@ ;
AlternationBody = Concatenation:c "," AlternationBody:a => [[ a c prefix ]]
| Concatenation => [[ 1array ]]
LAZY: 'term' ( -- parser )
'union'
'character-class' <|>
"?" token [ drop any-char-parser ] <@ <|>
"*" token [ drop any-char-parser <*> ] <@ <|>
'escaped-string' <|> ;
Element = "*" => [[ R/ .*/ ]]
| "?" => [[ R/ ./ ]]
| "[" CharClass:c "]" => [[ c ]]
| "{" AlternationBody:b "}" => [[ b <or> ]]
| Character
PRIVATE>
Concatenation = Element* => [[ <sequence> ]]
: <glob> ( string -- glob ) 'glob' just parse-1 just ;
End = !(.)
Main = Concatenation End
;EBNF
: glob-matches? ( input glob -- ? )
[ >lower ] [ <glob> ] bi* parse nil? not ;
[ >case-fold ] bi@ <glob> matches? ;

View File

@ -1,6 +1,6 @@
USING: help.markup help.syntax io kernel math namespaces parser
prettyprint sequences vocabs.loader namespaces stack-checker
help command-line multiline ;
help command-line multiline see ;
IN: help.cookbook
ARTICLE: "cookbook-syntax" "Basic syntax cookbook"

View File

@ -1,6 +1,6 @@
USING: math definitions help.topics help tools.test
prettyprint parser io.streams.string kernel source-files
assocs namespaces words io sequences eval accessors ;
assocs namespaces words io sequences eval accessors see ;
IN: help.definitions.tests
[ ] [ \ + >link see ] unit-test

View File

@ -1,8 +1,8 @@
! Copyright (C) 2007, 2008 Slava Pestov.
! Copyright (C) 2007, 2009 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors definitions help help.topics help.syntax
prettyprint.backend prettyprint.custom prettyprint words kernel
effects ;
effects see ;
IN: help.definitions
! Definition protocol implementation

View File

@ -194,6 +194,7 @@ ARTICLE: "io" "Input and output"
ARTICLE: "tools" "Developer tools"
{ $subsection "tools.vocabs" }
"Exploratory tools:"
{ $subsection "see" }
{ $subsection "editor" }
{ $subsection "listener" }
{ $subsection "tools.crossref" }

View File

@ -1,6 +1,6 @@
USING: help.markup help.crossref help.stylesheet help.topics
help.syntax definitions io prettyprint summary arrays math
sequences vocabs strings ;
sequences vocabs strings see ;
IN: help
ARTICLE: "printing-elements" "Printing markup elements"

View File

@ -132,6 +132,11 @@ SYMBOL: vocabs-quot
[ check-descriptions ]
} cleave ;
: check-class-description ( word element -- )
[ class? not ]
[ { $class-description } swap elements empty? not ] bi* and
[ "A word that is not a class has a $class-description" throw ] when ;
: all-word-help ( words -- seq )
[ word-help ] filter ;
@ -153,7 +158,8 @@ M: help-error error.
dup '[
_ dup word-help
[ check-values ]
[ nip [ check-nulls ] [ check-see-also ] [ check-markup ] tri ] 2bi
[ check-class-description ]
[ nip [ check-nulls ] [ check-see-also ] [ check-markup ] tri ] 2tri
] check-something
] [ drop ] if ;

View File

@ -4,7 +4,7 @@ USING: accessors arrays definitions generic io kernel assocs
hashtables namespaces make parser prettyprint sequences strings
io.styles vectors words math sorting splitting classes slots fry
sets vocabs help.stylesheet help.topics vocabs.loader quotations
combinators call ;
combinators call see ;
IN: help.markup
PREDICATE: simple-element < array
@ -13,7 +13,6 @@ PREDICATE: simple-element < array
SYMBOL: last-element
SYMBOL: span
SYMBOL: block
SYMBOL: table
: last-span? ( -- ? ) last-element get span eq? ;
: last-block? ( -- ? ) last-element get block eq? ;
@ -44,7 +43,7 @@ M: f print-element drop ;
[ print-element ] with-default-style ;
: ($block) ( quot -- )
last-element get { f table } member? [ nl ] unless
last-element get [ nl ] when
span last-element set
call
block last-element set ; inline
@ -218,7 +217,7 @@ ALIAS: $slot $snippet
table-content-style get [
swap [ last-element off call ] tabular-output
] with-style
] ($block) table last-element set ; inline
] ($block) ; inline
: $list ( element -- )
list-style get [
@ -301,7 +300,7 @@ M: f ($instance)
] with-style
] ($block) ; inline
: $see ( element -- ) first [ see ] ($see) ;
: $see ( element -- ) first [ see* ] ($see) ;
: $synopsis ( element -- ) first [ synopsis write ] ($see) ;
@ -346,6 +345,8 @@ M: f ($instance)
drop
"Throws an error if the I/O operation fails." $errors ;
FROM: prettyprint.private => with-pprint ;
: $prettyprinting-note ( children -- )
drop {
"This word should only be called from inside the "

View File

@ -4,6 +4,8 @@ io.streams.null accessors inspector html.streams
html.components html.forms namespaces
xml.writer ;
\ render must-infer
[ ] [ begin-form ] unit-test
[ ] [ 3 "hi" set-value ] unit-test

View File

@ -9,14 +9,10 @@ IN: http.tests
[ "text/html" utf8 ] [ "text/html; charset=UTF-8" parse-content-type ] unit-test
[ "text/html" utf8 ] [ "text/html; charset=\"utf-8\"" parse-content-type ] unit-test
[ "application/octet-stream" binary ] [ "application/octet-stream" parse-content-type ] unit-test
[ { } ] [ "" parse-cookie ] unit-test
[ { } ] [ "" parse-set-cookie ] unit-test
! Make sure that totally invalid cookies don't confuse us
[ { } ] [ "hello world; how are you" parse-cookie ] unit-test
: lf>crlf "\n" split "\r\n" join ;
STRING: read-request-test-1

View File

@ -34,7 +34,7 @@ IN: http
: check-header-string ( str -- str )
#! http://en.wikipedia.org/wiki/HTTP_Header_Injection
dup "\r\n\"" intersects?
dup "\r\n" intersects?
[ "Header injection attack" throw ] when ;
: write-header ( assoc -- )
@ -213,7 +213,10 @@ TUPLE: post-data data params content-type content-encoding ;
swap >>content-type ;
: parse-content-type-attributes ( string -- attributes )
" " split harvest [ "=" split1 [ >lower ] dip ] { } map>assoc ;
" " split harvest [
"=" split1
[ >lower ] [ "\"" ?head drop "\"" ?tail drop ] bi*
] { } map>assoc ;
: parse-content-type ( content-type -- type encoding )
";" split1

View File

@ -0,0 +1,16 @@
IN: http.parsers.tests
USING: http http.parsers tools.test ;
[ { } ] [ "" parse-cookie ] unit-test
[ { } ] [ "" parse-set-cookie ] unit-test
! Make sure that totally invalid cookies don't confuse us
[ { } ] [ "hello world; how are you" parse-cookie ] unit-test
[ { T{ cookie { name "__s" } { value "12345567" } } } ]
[ "__s=12345567" parse-cookie ]
unit-test
[ { T{ cookie { name "__s" } { value "12345567" } } } ]
[ "__s=12345567;" parse-cookie ]
unit-test

View File

@ -162,7 +162,7 @@ PEG: (parse-set-cookie) ( string -- alist )
'value' ,
'space' ,
] seq*
[ ";,=" member? not ] satisfy repeat1 [ drop f ] action
[ ";,=" member? not ] satisfy repeat0 [ drop f ] action
2choice ;
PEG: (parse-cookie) ( string -- alist )

View File

@ -53,9 +53,9 @@ IN: http.server.cgi
"CGI output follows" >>message
swap '[
binary encode-output
_ output-stream get swap <cgi-process> binary <process-stream> [
output-stream get _ <cgi-process> binary <process-stream> [
post-request? [ request get post-data>> data>> write flush ] when
'[ _ write ] each-block
'[ _ stream-write ] each-block
] with-stream
] >>body ;

View File

@ -108,11 +108,6 @@ M: bitmap-image load-image* ( path bitmap -- bitmap )
load-bitmap-data process-bitmap-data
fill-image-slots ;
M: bitmap-image normalize-scan-line-order
dup dim>> '[
_ first 4 * <sliced-groups> reverse concat
] change-bitmap ;
MACRO: (nbits>bitmap) ( bits -- )
[ -3 shift ] keep '[
bitmap-image new
@ -121,6 +116,7 @@ MACRO: (nbits>bitmap) ( bits -- )
swap >>width
swap array-copy [ >>bitmap ] [ >>color-index ] bi
_ >>bit-count fill-image-slots
t >>upside-down?
] ;
: bgr>bitmap ( array height width -- bitmap )

View File

@ -27,7 +27,7 @@ R16G16B16 R32G32B32 R16G16B16A16 R32G32B32A32 ;
{ R32G32B32A32 [ 16 ] }
} case ;
TUPLE: image dim component-order bitmap ;
TUPLE: image dim component-order upside-down? bitmap ;
: <image> ( -- image ) image new ; inline
@ -82,11 +82,16 @@ M: ARGB normalize-component-order*
M: ABGR normalize-component-order*
drop ARGB>RGBA 4 BGR>RGB ;
GENERIC: normalize-scan-line-order ( image -- image )
M: image normalize-scan-line-order ;
: normalize-scan-line-order ( image -- image )
dup upside-down?>> [
dup dim>> first 4 * '[
_ <groups> reverse concat
] change-bitmap
f >>upside-down?
] when ;
: normalize-image ( image -- image )
[ >byte-array ] change-bitmap
normalize-component-order
normalize-scan-line-order ;
normalize-scan-line-order
RGBA >>component-order ;

View File

@ -503,7 +503,7 @@ ERROR: unknown-component-order ifd ;
: ifd>image ( ifd -- image )
{
[ [ image-width find-tag ] [ image-length find-tag ] bi 2array ]
[ ifd-component-order ]
[ ifd-component-order f ]
[ bitmap>> ]
} cleave tiff-image boa ;

View File

@ -8,7 +8,7 @@ f describe
H{ } describe
H{ } describe
[ "fixnum instance\n" ] [ [ 3 describe ] with-string-writer ] unit-test
[ "fixnum instance\n\n" ] [ [ 3 describe ] with-string-writer ] unit-test
[ ] [ H{ } clone inspect ] unit-test

View File

@ -9,7 +9,7 @@ IN: inspector
SYMBOL: +number-rows+
: summary. ( obj -- ) [ summary ] keep write-object nl ;
: print-summary ( obj -- ) [ summary ] keep write-object ;
<PRIVATE
@ -40,7 +40,7 @@ M: mirror fix-slot-names
: (describe) ( obj assoc -- keys )
t pprint-string-cells? [
[ summary. ] [
[ print-summary nl ] [
dup hashtable? [ sort-unparsed-keys ] when
[ fix-slot-names add-numbers simple-table. ] [ keys ] bi
] bi*

View File

@ -8,3 +8,13 @@ IN: io.directories.search.tests
current-temporary-directory get [ ] find-all-files
] with-unique-directory drop [ natural-sort ] bi@ =
] unit-test
[ f ] [
{ "omg you shoudnt have a directory called this" "or this" }
t
[ "asdfasdfasdfasdfasdf" tail? ] find-in-directories
] unit-test
[ f ] [
{ } t [ "asdfasdfasdfasdfasdf" tail? ] find-in-directories
] unit-test

View File

@ -61,8 +61,8 @@ PRIVATE>
ERROR: file-not-found ;
: find-in-directories ( directories bfs? quot: ( obj -- ? ) -- path'/f )
[
'[ _ _ find-file [ file-not-found ] unless* ] attempt-all
'[
_ [ _ _ find-file [ file-not-found ] unless* ] attempt-all
] [
drop f
] recover ;

View File

@ -3,8 +3,11 @@
USING: help.syntax help.markup ;
IN: io.encodings.euc-kr
ABOUT: euc-kr
HELP: euc-kr
{ $class-description "This encoding class implements Microsoft's code page #949 encoding, also called Unified Hangul Code or ks_c_5601-1987, UHC. CP949 is extended version of EUC-KR and downward-compatibility to EUC-KR, in practice." }
{ $class-description "This encoding class implements Microsoft's CP949 encoding, also called Unified Hangul Code or ks_c_5601-1987, UHC. CP949 is extended version of EUC-KR and downward-compatible with EUC-KR in practice." }
{ $see-also "encodings-introduction" } ;
ARTICLE: "io.encodings.euc-kr" "EUC-KR encoding"
{ $subsection euc-kr } ;
ABOUT: "io.encodings.euc-kr"

View File

@ -3,7 +3,10 @@
USING: help.syntax help.markup ;
IN: io.encodings.johab
ABOUT: johab
HELP: johab
{ $class-description "Korean Johab encoding (KSC5601-1992). This encoding is not commonly used anymore." } ;
ARTICLE: "io.encodings.johab" "Korean Johab encoding"
{ $subsection johab } ;
ABOUT: "io.encodings.johab"

View File

@ -97,7 +97,7 @@ M: plain-writer make-block-stream
nip <ignore-close-stream> ;
M: plain-writer stream-write-table
[ drop format-table [ print ] each ] with-output-stream* ;
[ drop format-table [ nl ] [ write ] interleave ] with-output-stream* ;
M: plain-writer make-cell-stream 2drop <string-writer> ;

View File

@ -84,7 +84,7 @@ SYMBOL: max-stack-items
bi
] with-row
] each
] tabular-output
] tabular-output nl
] unless-empty ;
: trimmed-stack. ( seq -- )

View File

@ -1,7 +1,7 @@
! Copyright (C) 2007, 2008 Slava Pestov, Eduardo Cavazos.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors definitions effects generic kernel locals
macros memoize prettyprint prettyprint.backend words ;
macros memoize prettyprint prettyprint.backend see words ;
IN: locals.definitions
PREDICATE: lambda-word < word "lambda" word-prop >boolean ;

View File

@ -1,5 +1,5 @@
USING: help.syntax help.markup kernel macros prettyprint
memoize combinators arrays generalizations ;
memoize combinators arrays generalizations see ;
IN: locals
HELP: [|

View File

@ -2,7 +2,7 @@ USING: locals math sequences tools.test hashtables words kernel
namespaces arrays strings prettyprint io.streams.string parser
accessors generic eval combinators combinators.short-circuit
combinators.short-circuit.smart math.order math.functions
definitions compiler.units fry lexer words.symbol ;
definitions compiler.units fry lexer words.symbol see ;
IN: locals.tests
:: foo ( a b -- a a ) a a ;

View File

@ -1,6 +1,6 @@
IN: macros.tests
USING: tools.test macros math kernel arrays
vectors io.streams.string prettyprint parser eval ;
vectors io.streams.string prettyprint parser eval see ;
MACRO: see-test ( a b -- c ) + ;

View File

@ -84,7 +84,7 @@ M: word integer-op-input-classes
: define-integer-op-word ( fix-word big-word triple -- )
[
[ 2nip integer-op-word ] [ integer-op-quot ] 3bi
[ 2nip integer-op-word dup make-foldable ] [ integer-op-quot ] 3bi
(( x y -- z )) define-declared
] [
2nip

View File

@ -1,7 +1,7 @@
! Copyright (C) 2007, 2009 Slava Pestov, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: math kernel memoize tools.test parser generalizations
prettyprint io.streams.string sequences eval namespaces ;
prettyprint io.streams.string sequences eval namespaces see ;
IN: memoize.tests
MEMO: fib ( m -- n )

View File

@ -137,7 +137,7 @@ $nl
{ $subsection "models-delay" } ;
ARTICLE: "models-impl" "Implementing models"
"New types of models can be defined, for example see " { $vocab-link "models.filter" } "."
"New types of models can be defined, for example see " { $vocab-link "models.arrow" } "."
$nl
"Models can execute hooks when activated:"
{ $subsection model-activated }

View File

@ -5,15 +5,19 @@ images kernel namespaces ;
IN: opengl.textures.tests
[ ] [
{ 3 5 }
RGB
B{
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45
} image boa "image" set
T{ image
{ dim { 3 5 } }
{ component-order RGB }
{ bitmap
B{
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45
}
}
} "image" set
] unit-test
[

View File

@ -11,14 +11,16 @@ IN: opengl.textures
TUPLE: texture loc dim texture-coords texture display-list disposed ;
<PRIVATE
GENERIC: component-order>format ( component-order -- format type )
M: RGB component-order>format drop GL_RGB GL_UNSIGNED_BYTE ;
M: BGR component-order>format drop GL_BGR GL_UNSIGNED_BYTE ;
M: RGBA component-order>format drop GL_RGBA GL_UNSIGNED_BYTE ;
M: ARGB component-order>format drop GL_BGRA_EXT GL_UNSIGNED_INT_8_8_8_8_REV ;
M: BGRA component-order>format drop GL_BGRA_EXT GL_UNSIGNED_BYTE ;
<PRIVATE
: repeat-last ( seq n -- seq' )
over peek pad-tail concat ;

View File

@ -128,28 +128,28 @@ PEG: escaper ( string -- ast )
#! in the EBNF syntax itself.
[
{
[ dup blank? ]
[ dup CHAR: " = ]
[ dup CHAR: ' = ]
[ dup CHAR: | = ]
[ dup CHAR: { = ]
[ dup CHAR: } = ]
[ dup CHAR: = = ]
[ dup CHAR: ) = ]
[ dup CHAR: ( = ]
[ dup CHAR: ] = ]
[ dup CHAR: [ = ]
[ dup CHAR: . = ]
[ dup CHAR: ! = ]
[ dup CHAR: & = ]
[ dup CHAR: * = ]
[ dup CHAR: + = ]
[ dup CHAR: ? = ]
[ dup CHAR: : = ]
[ dup CHAR: ~ = ]
[ dup CHAR: < = ]
[ dup CHAR: > = ]
} 0|| not nip
[ blank? ]
[ CHAR: " = ]
[ CHAR: ' = ]
[ CHAR: | = ]
[ CHAR: { = ]
[ CHAR: } = ]
[ CHAR: = = ]
[ CHAR: ) = ]
[ CHAR: ( = ]
[ CHAR: ] = ]
[ CHAR: [ = ]
[ CHAR: . = ]
[ CHAR: ! = ]
[ CHAR: & = ]
[ CHAR: * = ]
[ CHAR: + = ]
[ CHAR: ? = ]
[ CHAR: : = ]
[ CHAR: ~ = ]
[ CHAR: < = ]
[ CHAR: > = ]
} 1|| not
] satisfy repeat1 [ >string <ebnf-non-terminal> ] action ;
: 'terminal' ( -- parser )
@ -161,9 +161,9 @@ PEG: escaper ( string -- ast )
#! Parse a valid foreign parser name
[
{
[ dup blank? ]
[ dup CHAR: > = ]
} 0|| not nip
[ blank? ]
[ CHAR: > = ]
} 1|| not
] satisfy repeat1 [ >string ] action ;
: 'foreign' ( -- parser )

View File

@ -1,6 +1,7 @@
USING: prettyprint.backend prettyprint.config prettyprint.custom
prettyprint.sections prettyprint.private help.markup help.syntax
io kernel words definitions quotations strings generic classes ;
io kernel words definitions quotations strings generic classes
prettyprint.private ;
IN: prettyprint
ARTICLE: "prettyprint-numbers" "Prettyprinting numbers"
@ -149,10 +150,6 @@ $nl
{ $subsection unparse-use }
"Utility for tabular output:"
{ $subsection pprint-cell }
"Printing a definition (see " { $link "definitions" } "):"
{ $subsection see }
"Printing the methods defined on a generic word or class (see " { $link "objects" } "):"
{ $subsection see-methods }
"More prettyprinter usage:"
{ $subsection "prettyprint-numbers" }
{ $subsection "prettyprint-stacks" }
@ -160,7 +157,7 @@ $nl
{ $subsection "prettyprint-variables" }
{ $subsection "prettyprint-extension" }
{ $subsection "prettyprint-limitations" }
{ $see-also "number-strings" } ;
{ $see-also "number-strings" "see" } ;
ABOUT: "prettyprint"
@ -232,51 +229,4 @@ HELP: .s
HELP: in.
{ $values { "vocab" "a vocabulary specifier" } }
{ $description "Prettyprints a " { $snippet "IN:" } " declaration." }
$prettyprinting-note ;
HELP: synopsis
{ $values { "defspec" "a definition specifier" } { "str" string } }
{ $contract "Prettyprints the prologue of a definition." } ;
HELP: synopsis*
{ $values { "defspec" "a definition specifier" } }
{ $contract "Adds sections to the current block corresponding to a the prologue of a definition, in source code-like form." }
{ $notes "This word should only be called from inside the " { $link with-pprint } " combinator. Client code should call " { $link synopsis } " instead." } ;
HELP: comment.
{ $values { "string" "a string" } }
{ $description "Prettyprints some text with the comment style." }
$prettyprinting-note ;
HELP: see
{ $values { "defspec" "a definition specifier" } }
{ $contract "Prettyprints a definition." } ;
HELP: see-methods
{ $values { "word" "a " { $link generic } " or a " { $link class } } }
{ $contract "Prettyprints the methods defined on a generic word or class." } ;
HELP: definer
{ $values { "defspec" "a definition specifier" } { "start" word } { "end" "a word or " { $link f } } }
{ $contract "Outputs the parsing words which delimit the definition." }
{ $examples
{ $example "USING: definitions prettyprint ;"
"IN: scratchpad"
": foo ; \\ foo definer . ."
";\nPOSTPONE: :"
}
{ $example "USING: definitions prettyprint ;"
"IN: scratchpad"
"SYMBOL: foo \\ foo definer . ."
"f\nPOSTPONE: SYMBOL:"
}
}
{ $notes "This word is used in the implementation of " { $link see } "." } ;
HELP: definition
{ $values { "defspec" "a definition specifier" } { "seq" "a sequence" } }
{ $contract "Outputs the body of a definition." }
{ $examples
{ $example "USING: definitions math prettyprint ;" "\\ sq definition ." "[ dup * ]" }
}
{ $notes "This word is used in the implementation of " { $link see } "." } ;
$prettyprinting-note ;

View File

@ -3,7 +3,7 @@ kernel math namespaces parser prettyprint prettyprint.config
prettyprint.sections sequences tools.test vectors words
effects splitting generic.standard prettyprint.private
continuations generic compiler.units tools.walker eval
accessors make vocabs.parser ;
accessors make vocabs.parser see ;
IN: prettyprint.tests
[ "4" ] [ 4 unparse ] unit-test

View File

@ -1,16 +1,14 @@
! Copyright (C) 2003, 2008 Slava Pestov.
! Copyright (C) 2003, 2009 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: arrays generic generic.standard assocs io kernel math
namespaces make sequences strings io.styles io.streams.string
vectors words words.symbol prettyprint.backend prettyprint.custom
prettyprint.sections prettyprint.config sorting splitting
grouping math.parser vocabs definitions effects classes.builtin
classes.tuple io.pathnames classes continuations hashtables
classes.mixin classes.union classes.intersection
classes.predicate classes.singleton combinators quotations sets
accessors colors parser summary vocabs.parser ;
USING: arrays accessors assocs colors combinators grouping io
io.streams.string io.styles kernel make math math.parser namespaces
parser prettyprint.backend prettyprint.config prettyprint.custom
prettyprint.sections quotations sequences sorting strings vocabs
vocabs.parser words sets ;
IN: prettyprint
<PRIVATE
: make-pprint ( obj quot -- block in use )
[
0 position set
@ -34,7 +32,7 @@ IN: prettyprint
[ \ IN: pprint-word pprint-vocab ] with-pprint ;
: in. ( vocab -- )
[ write-in nl ] when* ;
[ write-in ] when* ;
: use. ( seq -- )
[
@ -42,31 +40,39 @@ IN: prettyprint
\ USING: pprint-word
[ pprint-vocab ] each
\ ; pprint-word
] with-pprint nl
] with-pprint
] unless-empty ;
: use/in. ( in use -- )
dupd remove [ { "syntax" "scratchpad" } member? not ] filter
use. in. ;
over "syntax" 2array diff
[ nip use. ]
[ empty? not and [ nl ] when ]
[ drop in. ]
2tri ;
: vocab-names ( words -- vocabs )
dictionary get
[ [ words>> eq? nip ] with assoc-find 2drop ] curry map sift ;
: prelude. ( -- )
in get use get vocab-names use/in. ;
in get use get vocab-names prune in get ".private" append swap remove use/in. ;
[
nl
"Restarts were invoked adding vocabularies to the search path." print
"To avoid doing this in the future, add the following USING:" print
"and IN: forms at the top of the source file:" print nl
prelude.
nl
{ { font-style bold } { font-name "sans-serif" } } [
"Restarts were invoked adding vocabularies to the search path." print
"To avoid doing this in the future, add the following USING:" print
"and IN: forms at the top of the source file:" print nl
] with-style
{ { page-color T{ rgba f 0.8 0.8 0.8 1.0 } } } [ prelude. ] with-nesting
nl nl
] print-use-hook set-global
PRIVATE>
: with-use ( obj quot -- )
make-pprint use/in. do-pprint ; inline
make-pprint [ use/in. ] [ empty? not or [ nl ] when ] 2bi
do-pprint ; inline
: with-in ( obj quot -- )
make-pprint drop [ write-in bl ] when* do-pprint ; inline
@ -165,214 +171,4 @@ SYMBOL: pprint-string-cells?
] each
] with-row
] each
] tabular-output ;
GENERIC: see ( defspec -- )
: comment. ( string -- )
[ H{ { font-style italic } } styled-text ] when* ;
: seeing-word ( word -- )
vocabulary>> pprinter-in set ;
: definer. ( defspec -- )
definer drop pprint-word ;
: stack-effect. ( word -- )
[ [ parsing-word? ] [ symbol? ] bi or not ] [ stack-effect ] bi and
[ effect>string comment. ] when* ;
: word-synopsis ( word -- )
{
[ seeing-word ]
[ definer. ]
[ pprint-word ]
[ stack-effect. ]
} cleave ;
M: word synopsis* word-synopsis ;
M: simple-generic synopsis* word-synopsis ;
M: standard-generic synopsis*
{
[ definer. ]
[ seeing-word ]
[ pprint-word ]
[ dispatch# pprint* ]
[ stack-effect. ]
} cleave ;
M: hook-generic synopsis*
{
[ definer. ]
[ seeing-word ]
[ pprint-word ]
[ "combination" word-prop var>> pprint* ]
[ stack-effect. ]
} cleave ;
M: method-spec synopsis*
first2 method synopsis* ;
M: method-body synopsis*
[ definer. ]
[ "method-class" word-prop pprint-word ]
[ "method-generic" word-prop pprint-word ] tri ;
M: mixin-instance synopsis*
[ definer. ]
[ class>> pprint-word ]
[ mixin>> pprint-word ] tri ;
M: pathname synopsis* pprint* ;
: synopsis ( defspec -- str )
[
0 margin set
1 line-limit set
[ synopsis* ] with-in
] with-string-writer ;
M: word summary synopsis ;
GENERIC: declarations. ( obj -- )
M: object declarations. drop ;
: declaration. ( word prop -- )
[ nip ] [ name>> word-prop ] 2bi
[ pprint-word ] [ drop ] if ;
M: word declarations.
{
POSTPONE: parsing
POSTPONE: delimiter
POSTPONE: inline
POSTPONE: recursive
POSTPONE: foldable
POSTPONE: flushable
} [ declaration. ] with each ;
: pprint-; ( -- ) \ ; pprint-word ;
M: object see
[
12 nesting-limit set
100 length-limit set
<colon dup synopsis*
<block dup definition pprint-elements block>
dup definer nip [ pprint-word ] when* declarations.
block>
] with-use nl ;
M: method-spec see
first2 method see ;
GENERIC: see-class* ( word -- )
M: union-class see-class*
<colon \ UNION: pprint-word
dup pprint-word
members pprint-elements pprint-; block> ;
M: intersection-class see-class*
<colon \ INTERSECTION: pprint-word
dup pprint-word
participants pprint-elements pprint-; block> ;
M: mixin-class see-class*
<block \ MIXIN: pprint-word
dup pprint-word <block
dup members [
hard line-break
\ INSTANCE: pprint-word pprint-word pprint-word
] with each block> block> ;
M: predicate-class see-class*
<colon \ PREDICATE: pprint-word
dup pprint-word
"<" text
dup superclass pprint-word
<block
"predicate-definition" word-prop pprint-elements
pprint-; block> block> ;
M: singleton-class see-class* ( class -- )
\ SINGLETON: pprint-word pprint-word ;
GENERIC: pprint-slot-name ( object -- )
M: string pprint-slot-name text ;
M: array pprint-slot-name
<flow \ { pprint-word
f <inset unclip text pprint-elements block>
\ } pprint-word block> ;
: unparse-slot ( slot-spec -- array )
[
dup name>> ,
dup class>> object eq? [
dup class>> ,
initial: ,
dup initial>> ,
] unless
dup read-only>> [
read-only ,
] when
drop
] { } make ;
: pprint-slot ( slot-spec -- )
unparse-slot
dup length 1 = [ first ] when
pprint-slot-name ;
M: tuple-class see-class*
<colon \ TUPLE: pprint-word
dup pprint-word
dup superclass tuple eq? [
"<" text dup superclass pprint-word
] unless
<block "slots" word-prop [ pprint-slot ] each block>
pprint-; block> ;
M: word see-class* drop ;
M: builtin-class see-class*
drop "! Built-in class" comment. ;
: see-class ( class -- )
dup class? [
[
dup seeing-word dup see-class*
] with-use nl
] when drop ;
M: word see
[ see-class ]
[ [ class? ] [ symbol? not ] bi and [ nl ] when ]
[
dup [ class? ] [ symbol? ] bi and
[ drop ] [ call-next-method ] if
] tri ;
: see-all ( seq -- )
natural-sort [ nl ] [ see ] interleave ;
: (see-implementors) ( class -- seq )
dup implementors [ method ] with map natural-sort ;
: (see-methods) ( generic -- seq )
"methods" word-prop values natural-sort ;
: methods ( word -- seq )
[
dup class? [ dup (see-implementors) % ] when
dup generic? [ dup (see-methods) % ] when
drop
] { } make prune ;
: see-methods ( word -- )
methods see-all ;
] tabular-output nl ;

View File

@ -199,7 +199,7 @@ HELP: <flow
HELP: colon
{ $class-description "A " { $link block } " section. When printed as a " { $link long-section } ", indents every line except the first." }
{ $notes "Colon sections are used to enclose word definitions printed by " { $link see } "." } ;
{ $notes "Colon sections are used to enclose word definitions when " { $link "see" } "." } ;
HELP: <colon
{ $description "Begins a " { $link colon } " section." } ;

View File

@ -0,0 +1,65 @@
! Copyright (C) 2008, 2009 Doug Coleman, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel arrays accessors fry sequences regexp.classes ;
FROM: math.ranges => [a,b] ;
IN: regexp.ast
TUPLE: negation term ;
C: <negation> negation
TUPLE: from-to n m ;
C: <from-to> from-to
TUPLE: at-least n ;
C: <at-least> at-least
TUPLE: tagged-epsilon tag ;
C: <tagged-epsilon> tagged-epsilon
CONSTANT: epsilon T{ tagged-epsilon { tag t } }
TUPLE: concatenation first second ;
: <concatenation> ( seq -- concatenation )
[ epsilon ] [ unclip [ concatenation boa ] reduce ] if-empty ;
TUPLE: alternation first second ;
: <alternation> ( seq -- alternation )
unclip [ alternation boa ] reduce ;
TUPLE: star term ;
C: <star> star
TUPLE: with-options tree options ;
C: <with-options> with-options
TUPLE: options on off ;
C: <options> options
SINGLETONS: unix-lines dotall multiline comments case-insensitive
unicode-case reversed-regexp ;
: <maybe> ( term -- term' )
f <concatenation> 2array <alternation> ;
: <plus> ( term -- term' )
dup <star> 2array <concatenation> ;
: repetition ( n term -- term' )
<array> <concatenation> ;
GENERIC: <times> ( term times -- term' )
M: at-least <times>
n>> swap [ repetition ] [ <star> ] bi 2array <concatenation> ;
M: from-to <times>
[ n>> ] [ m>> ] bi [a,b] swap '[ _ repetition ] map <alternation> ;
: char-class ( ranges ? -- term )
[ <or-class> ] dip [ <not-class> ] when ;
TUPLE: lookahead term ;
C: <lookahead> lookahead
TUPLE: lookbehind term ;
C: <lookbehind> lookbehind

View File

@ -1,27 +0,0 @@
! Copyright (C) 2008 Doug Coleman.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors hashtables kernel math vectors ;
IN: regexp.backend
TUPLE: regexp
raw
{ options hashtable }
stack
parse-tree
nfa-table
dfa-table
minimized-table
matchers
{ nfa-traversal-flags hashtable }
{ dfa-traversal-flags hashtable }
{ state integer }
{ new-states vector }
{ visited-states hashtable } ;
: reset-regexp ( regexp -- regexp )
0 >>state
V{ } clone >>stack
V{ } clone >>new-states
H{ } clone >>visited-states ;
SYMBOL: current-regexp

View File

@ -0,0 +1,60 @@
! Copyright (C) 2009 Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: regexp.classes tools.test arrays kernel ;
IN: regexp.classes.tests
! Class algebra
[ f ] [ { 1 2 } <and-class> ] unit-test
[ T{ or-class f { 1 2 } } ] [ { 1 2 } <or-class> ] unit-test
[ 3 ] [ { 1 2 } <and-class> 3 2array <or-class> ] unit-test
[ CHAR: A ] [ CHAR: A LETTER-class <primitive-class> 2array <and-class> ] unit-test
[ CHAR: A ] [ LETTER-class <primitive-class> CHAR: A 2array <and-class> ] unit-test
[ T{ primitive-class { class LETTER-class } } ] [ CHAR: A LETTER-class <primitive-class> 2array <or-class> ] unit-test
[ T{ primitive-class { class LETTER-class } } ] [ LETTER-class <primitive-class> CHAR: A 2array <or-class> ] unit-test
[ t ] [ { t 1 } <or-class> ] unit-test
[ t ] [ { 1 t } <or-class> ] unit-test
[ f ] [ { f 1 } <and-class> ] unit-test
[ f ] [ { 1 f } <and-class> ] unit-test
[ 1 ] [ { f 1 } <or-class> ] unit-test
[ 1 ] [ { 1 f } <or-class> ] unit-test
[ 1 ] [ { t 1 } <and-class> ] unit-test
[ 1 ] [ { 1 t } <and-class> ] unit-test
[ 1 ] [ 1 <not-class> <not-class> ] unit-test
[ 1 ] [ { 1 1 } <and-class> ] unit-test
[ 1 ] [ { 1 1 } <or-class> ] unit-test
[ t ] [ { t t } <or-class> ] unit-test
[ T{ primitive-class { class letter-class } } ] [ letter-class <primitive-class> dup 2array <and-class> ] unit-test
[ T{ primitive-class { class letter-class } } ] [ letter-class <primitive-class> dup 2array <or-class> ] unit-test
[ T{ or-class { seq { 1 2 3 } } } ] [ { 1 2 } <or-class> { 2 3 } <or-class> 2array <or-class> ] unit-test
[ T{ or-class { seq { 2 3 } } } ] [ { 2 3 } <or-class> 1 <not-class> 2array <and-class> ] unit-test
[ f ] [ t <not-class> ] unit-test
[ t ] [ f <not-class> ] unit-test
[ f ] [ 1 <not-class> 1 t answer ] unit-test
[ t ] [ { 1 2 } <or-class> <not-class> 1 2 3array <or-class> ] unit-test
[ f ] [ { 1 2 } <and-class> <not-class> 1 2 3array <and-class> ] unit-test
! Making classes into nested conditionals
[ V{ 1 2 3 4 } ] [ T{ and-class f { 1 T{ not-class f 2 } T{ or-class f { 3 4 } } 2 } } class>questions ] unit-test
[ { 3 } ] [ { { 3 t } } table>condition ] unit-test
[ { T{ primitive-class } } ] [ { { 1 t } { 2 T{ primitive-class } } } table>questions ] unit-test
[ { { 1 t } { 2 t } } ] [ { { 1 t } { 2 T{ primitive-class } } } T{ primitive-class } t assoc-answer ] unit-test
[ { { 1 t } } ] [ { { 1 t } { 2 T{ primitive-class } } } T{ primitive-class } f assoc-answer ] unit-test
[ T{ condition f T{ primitive-class } { 1 2 } { 1 } } ] [ { { 1 t } { 2 T{ primitive-class } } } table>condition ] unit-test
SYMBOL: foo
SYMBOL: bar
[ T{ condition f T{ primitive-class f bar } T{ condition f T{ primitive-class f foo } { 1 3 2 } { 1 3 } } T{ condition f T{ primitive-class f foo } { 1 2 } { 1 } } } ] [ { { 1 t } { 3 T{ primitive-class f bar } } { 2 T{ primitive-class f foo } } } table>condition ] unit-test
[ t ] [ foo <primitive-class> dup t answer ] unit-test
[ f ] [ foo <primitive-class> dup f answer ] unit-test
[ T{ primitive-class f foo } ] [ foo <primitive-class> bar <primitive-class> t answer ] unit-test
[ T{ primitive-class f foo } ] [ foo <primitive-class> bar <primitive-class> f answer ] unit-test
[ T{ primitive-class f foo } ] [ foo <primitive-class> bar <primitive-class> 2array <and-class> bar <primitive-class> t answer ] unit-test
[ T{ primitive-class f bar } ] [ foo <primitive-class> bar <primitive-class> 2array <and-class> foo <primitive-class> t answer ] unit-test
[ f ] [ foo <primitive-class> bar <primitive-class> 2array <and-class> foo <primitive-class> f answer ] unit-test
[ f ] [ foo <primitive-class> bar <primitive-class> 2array <and-class> bar <primitive-class> f answer ] unit-test
[ t ] [ foo <primitive-class> bar <primitive-class> 2array <or-class> bar <primitive-class> t answer ] unit-test
[ T{ primitive-class f foo } ] [ foo <primitive-class> bar <primitive-class> 2array <or-class> bar <primitive-class> f answer ] unit-test

View File

@ -1,7 +1,8 @@
! Copyright (C) 2008 Doug Coleman.
! Copyright (C) 2008, 2009 Doug Coleman, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors kernel math math.order words regexp.utils
unicode.categories combinators.short-circuit ;
USING: accessors kernel math math.order words combinators locals
ascii unicode.categories combinators.short-circuit sequences
fry macros arrays assocs sets classes mirrors ;
IN: regexp.classes
SINGLETONS: any-char any-char-no-nl
@ -11,19 +12,18 @@ ascii-class punctuation-class java-printable-class blank-class
control-character-class hex-digit-class java-blank-class c-identifier-class
unmatchable-class terminator-class word-boundary-class ;
SINGLETONS: beginning-of-input beginning-of-line
end-of-input end-of-line ;
SINGLETONS: beginning-of-input ^ end-of-input $ end-of-file word-break ;
MIXIN: node
TUPLE: character-class-range from to ; INSTANCE: character-class-range node
TUPLE: range from to ;
C: <range> range
GENERIC: class-member? ( obj class -- ? )
M: t class-member? ( obj class -- ? ) 2drop f ;
M: t class-member? ( obj class -- ? ) 2drop t ;
M: integer class-member? ( obj class -- ? ) 2drop f ;
M: integer class-member? ( obj class -- ? ) = ;
M: character-class-range class-member? ( obj class -- ? )
M: range class-member? ( obj class -- ? )
[ from>> ] [ to>> ] bi between? ;
M: any-char class-member? ( obj class -- ? )
@ -47,16 +47,24 @@ M: ascii-class class-member? ( obj class -- ? )
M: digit-class class-member? ( obj class -- ? )
drop digit? ;
: c-identifier-char? ( ch -- ? )
{ [ alpha? ] [ CHAR: _ = ] } 1|| ;
M: c-identifier-class class-member? ( obj class -- ? )
drop
{ [ digit? ] [ Letter? ] [ CHAR: _ = ] } 1|| ;
drop c-identifier-char? ;
M: alpha-class class-member? ( obj class -- ? )
drop alpha? ;
: punct? ( ch -- ? )
"!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~" member? ;
M: punctuation-class class-member? ( obj class -- ? )
drop punct? ;
: java-printable? ( ch -- ? )
{ [ alpha? ] [ punct? ] } 1|| ;
M: java-printable-class class-member? ( obj class -- ? )
drop java-printable? ;
@ -64,11 +72,24 @@ M: non-newline-blank-class class-member? ( obj class -- ? )
drop { [ blank? ] [ CHAR: \n = not ] } 1&& ;
M: control-character-class class-member? ( obj class -- ? )
drop control-char? ;
drop control? ;
: hex-digit? ( ch -- ? )
{
[ CHAR: A CHAR: F between? ]
[ CHAR: a CHAR: f between? ]
[ CHAR: 0 CHAR: 9 between? ]
} 1|| ;
M: hex-digit-class class-member? ( obj class -- ? )
drop hex-digit? ;
: java-blank? ( ch -- ? )
{
CHAR: \s CHAR: \t CHAR: \n
HEX: b HEX: 7 CHAR: \r
} member? ;
M: java-blank-class class-member? ( obj class -- ? )
drop java-blank? ;
@ -76,16 +97,219 @@ M: unmatchable-class class-member? ( obj class -- ? )
2drop f ;
M: terminator-class class-member? ( obj class -- ? )
drop {
[ CHAR: \r = ]
[ CHAR: \n = ]
[ CHAR: \u000085 = ]
[ CHAR: \u002028 = ]
[ CHAR: \u002029 = ]
drop "\r\n\u000085\u002029\u002028" member? ;
M: ^ class-member? ( obj class -- ? )
2drop f ;
M: $ class-member? ( obj class -- ? )
2drop f ;
M: f class-member? 2drop f ;
TUPLE: primitive-class class ;
C: <primitive-class> primitive-class
TUPLE: not-class class ;
PREDICATE: not-integer < not-class class>> integer? ;
PREDICATE: not-primitive < not-class class>> primitive-class? ;
M: not-class class-member?
class>> class-member? not ;
TUPLE: or-class seq ;
M: or-class class-member?
seq>> [ class-member? ] with any? ;
TUPLE: and-class seq ;
M: and-class class-member?
seq>> [ class-member? ] with all? ;
DEFER: substitute
: flatten ( seq class -- newseq )
'[ dup _ instance? [ seq>> ] [ 1array ] if ] map concat ; inline
:: seq>instance ( seq empty class -- instance )
seq length {
{ 0 [ empty ] }
{ 1 [ seq first ] }
[ drop class new seq { } like >>seq ]
} case ; inline
TUPLE: class-partition integers not-integers primitives not-primitives and or other ;
: partition-classes ( seq -- class-partition )
prune
[ integer? ] partition
[ not-integer? ] partition
[ primitive-class? ] partition ! extend primitive-class to epsilon tags
[ not-primitive? ] partition
[ and-class? ] partition
[ or-class? ] partition
class-partition boa ;
: class-partition>seq ( class-partition -- seq )
make-mirror values concat ;
: repartition ( partition -- partition' )
! This could be made more efficient; only and and or are effected
class-partition>seq partition-classes ;
: filter-not-integers ( partition -- partition' )
dup
[ primitives>> ] [ not-primitives>> ] [ or>> ] tri
3append and-class boa
'[ [ class>> _ class-member? ] filter ] change-not-integers ;
: answer-ors ( partition -- partition' )
dup [ not-integers>> ] [ not-primitives>> ] [ primitives>> ] tri 3append
'[ [ _ [ t substitute ] each ] map ] change-or ;
: contradiction? ( partition -- ? )
{
[ [ primitives>> ] [ not-primitives>> ] bi intersects? ]
[ other>> f swap member? ]
} 1|| ;
M: beginning-of-line class-member? ( obj class -- ? )
2drop f ;
: make-and-class ( partition -- and-class )
answer-ors repartition
[ t swap remove ] change-other
dup contradiction?
[ drop f ]
[ filter-not-integers class-partition>seq prune t and-class seq>instance ] if ;
M: end-of-line class-member? ( obj class -- ? )
2drop f ;
: <and-class> ( seq -- class )
dup and-class flatten partition-classes
dup integers>> length {
{ 0 [ nip make-and-class ] }
{ 1 [ integers>> first [ '[ _ swap class-member? ] all? ] keep and ] }
[ 3drop f ]
} case ;
: filter-integers ( partition -- partition' )
dup
[ primitives>> ] [ not-primitives>> ] [ and>> ] tri
3append or-class boa
'[ [ _ class-member? not ] filter ] change-integers ;
: answer-ands ( partition -- partition' )
dup [ integers>> ] [ not-primitives>> ] [ primitives>> ] tri 3append
'[ [ _ [ f substitute ] each ] map ] change-and ;
: tautology? ( partition -- ? )
{
[ [ primitives>> ] [ not-primitives>> ] bi intersects? ]
[ other>> t swap member? ]
} 1|| ;
: make-or-class ( partition -- and-class )
answer-ands repartition
[ f swap remove ] change-other
dup tautology?
[ drop t ]
[ filter-integers class-partition>seq prune f or-class seq>instance ] if ;
: <or-class> ( seq -- class )
dup or-class flatten partition-classes
dup not-integers>> length {
{ 0 [ nip make-or-class ] }
{ 1 [ not-integers>> first [ class>> '[ _ swap class-member? ] any? ] keep or ] }
[ 3drop t ]
} case ;
GENERIC: <not-class> ( class -- inverse )
M: object <not-class>
not-class boa ;
M: not-class <not-class>
class>> ;
M: and-class <not-class>
seq>> [ <not-class> ] map <or-class> ;
M: or-class <not-class>
seq>> [ <not-class> ] map <and-class> ;
M: t <not-class> drop f ;
M: f <not-class> drop t ;
M: primitive-class class-member?
class>> class-member? ;
UNION: class primitive-class not-class or-class and-class range ;
TUPLE: condition question yes no ;
C: <condition> condition
GENERIC# answer 2 ( class from to -- new-class )
M:: object answer ( class from to -- new-class )
class from = to class ? ;
: replace-compound ( class from to -- seq )
[ seq>> ] 2dip '[ _ _ answer ] map ;
M: and-class answer
replace-compound <and-class> ;
M: or-class answer
replace-compound <or-class> ;
M: not-class answer
[ class>> ] 2dip answer <not-class> ;
GENERIC# substitute 1 ( class from to -- new-class )
M: object substitute answer ;
M: not-class substitute [ <not-class> ] bi@ answer ;
: assoc-answer ( table question answer -- new-table )
'[ _ _ substitute ] assoc-map
[ nip ] assoc-filter ;
: assoc-answers ( table questions answer -- new-table )
'[ _ assoc-answer ] each ;
DEFER: make-condition
: (make-condition) ( table questions question -- condition )
[ 2nip ]
[ swap [ t assoc-answer ] dip make-condition ]
[ swap [ f assoc-answer ] dip make-condition ] 3tri
2dup = [ 2nip ] [ <condition> ] if ;
: make-condition ( table questions -- condition )
[ keys ] [ unclip (make-condition) ] if-empty ;
GENERIC: class>questions ( class -- questions )
: compound-questions ( class -- questions ) seq>> [ class>questions ] gather ;
M: or-class class>questions compound-questions ;
M: and-class class>questions compound-questions ;
M: not-class class>questions class>> class>questions ;
M: object class>questions 1array ;
: table>questions ( table -- questions )
values [ class>questions ] gather >array t swap remove ;
: table>condition ( table -- condition )
! input table is state => class
>alist dup table>questions make-condition ;
: condition-map ( condition quot: ( obj -- obj' ) -- new-condition )
over condition? [
[ [ question>> ] [ yes>> ] [ no>> ] tri ] dip
'[ _ condition-map ] bi@ <condition>
] [ call ] if ; inline recursive
: condition-states ( condition -- states )
dup condition? [
[ yes>> ] [ no>> ] bi
[ condition-states ] bi@ append prune
] [ 1array ] if ;
: condition-at ( condition assoc -- new-condition )
'[ _ at ] condition-map ;

View File

@ -0,0 +1 @@
Daniel Ehrenberg

View File

@ -0,0 +1,54 @@
! Copyright (C) 2009 Daniel Ehrenberg
! See http://factorcode.org/license.txt for BSD license.
USING: help.syntax help.markup regexp strings ;
IN: regexp.combinators
ABOUT: "regexp.combinators"
ARTICLE: "regexp.combinators" "Regular expression combinators"
"The " { $vocab-link "regexp.combinators" } " vocabulary defines combinators which can be used to build up regular expressions to match strings. This is in addition to the traditional syntax defined in the " { $vocab-link "regexp" } " vocabulary."
{ $subsection <literal> }
{ $subsection <nothing> }
{ $subsection <or> }
{ $subsection <and> }
{ $subsection <not> }
{ $subsection <sequence> }
{ $subsection <zero-or-more> }
{ $subsection <one-or-more> }
{ $subsection <option> } ;
HELP: <literal>
{ $values { "string" string } { "regexp" regexp } }
{ $description "Creates a regular expression which matches the given literal string." } ;
HELP: <nothing>
{ $values { "value" regexp } }
{ $description "The empty regular language." } ;
HELP: <or>
{ $values { "regexps" "a sequence of regular expressions" } { "disjunction" regexp } }
{ $description "Creates a new regular expression which matches the union of what elements of the sequence match." } ;
HELP: <and>
{ $values { "regexps" "a sequence of regular expressions" } { "conjunction" regexp } }
{ $description "Creates a new regular expression which matches the intersection of what elements of the sequence match." } ;
HELP: <sequence>
{ $values { "regexps" "a sequence of regular expressions" } { "regexp" regexp } }
{ $description "Creates a new regular expression which matches strings that match each element of the sequence in order." } ;
HELP: <not>
{ $values { "regexp" regexp } { "not-regexp" regexp } }
{ $description "Creates a new regular expression which matches everything that the given regexp does not match." } ;
HELP: <one-or-more>
{ $values { "regexp" regexp } { "regexp+" regexp } }
{ $description "Creates a new regular expression which matches one or more copies of the given regexp." } ;
HELP: <option>
{ $values { "regexp" regexp } { "regexp?" regexp } }
{ $description "Creates a new regular expression which matches zero or one copies of the given regexp." } ;
HELP: <zero-or-more>
{ $values { "regexp" regexp } { "regexp*" regexp } }
{ $description "Creates a new regular expression which matches zero or more copies of the given regexp." } ;

View File

@ -0,0 +1,28 @@
! Copyright (C) 2009 Daniel Ehrenberg
! See http://factorcode.org/license.txt for BSD license.
USING: regexp.combinators tools.test regexp kernel sequences ;
IN: regexp.combinators.tests
: strings ( -- regexp )
{ "foo" "bar" "baz" } <any-of> ;
[ t t t ] [ "foo" "bar" "baz" [ strings matches? ] tri@ ] unit-test
[ f f f ] [ "food" "ibar" "ba" [ strings matches? ] tri@ ] unit-test
: conj ( -- regexp )
{ R' .*a' R' b.*' } <and> ;
[ t ] [ "bljhasflsda" conj matches? ] unit-test
[ f ] [ "bsdfdfs" conj matches? ] unit-test
[ f ] [ "fsfa" conj matches? ] unit-test
[ f ] [ "bljhasflsda" conj <not> matches? ] unit-test
[ t ] [ "bsdfdfs" conj <not> matches? ] unit-test
[ t ] [ "fsfa" conj <not> matches? ] unit-test
[ f f ] [ "" "hi" [ <nothing> matches? ] bi@ ] unit-test
[ t t ] [ "" "hi" [ <nothing> <not> matches? ] bi@ ] unit-test
[ { t t t f } ] [ { "" "a" "aaaaa" "aab" } [ "a" <literal> <zero-or-more> matches? ] map ] unit-test
[ { f t t f } ] [ { "" "a" "aaaaa" "aab" } [ "a" <literal> <one-or-more> matches? ] map ] unit-test
[ { t t f f } ] [ { "" "a" "aaaaa" "aab" } [ "a" <literal> <option> matches? ] map ] unit-test

View File

@ -0,0 +1,56 @@
! Copyright (C) 2009 Daniel Ehrenberg
! See http://factorcode.org/license.txt for BSD license.
USING: regexp sequences kernel regexp.negation regexp.ast
accessors fry regexp.classes ;
IN: regexp.combinators
<PRIVATE
: modify-regexp ( regexp raw-quot tree-quot -- new-regexp )
[ '[ raw>> @ ] ]
[ '[ parse-tree>> @ ] ] bi* bi
make-regexp ; inline
PRIVATE>
CONSTANT: <nothing> R/ (?~.*)/
: <literal> ( string -- regexp )
[ "\\Q" "\\E" surround ] [ <concatenation> ] bi make-regexp ; foldable
: <char-range> ( char1 char2 -- regexp )
[ [ "[" "-" surround ] [ "]" append ] bi* append ]
[ <range> ]
2bi make-regexp ;
: <or> ( regexps -- disjunction )
[ [ raw>> "(" ")" surround ] map "|" join ]
[ [ parse-tree>> ] map <alternation> ] bi
make-regexp ; foldable
: <any-of> ( strings -- regexp )
[ <literal> ] map <or> ; foldable
: <sequence> ( regexps -- regexp )
[ [ raw>> ] map concat ]
[ [ parse-tree>> ] map <concatenation> ] bi
make-regexp ; foldable
: <not> ( regexp -- not-regexp )
[ "(?~" ")" surround ]
[ <negation> ] modify-regexp ; foldable
: <and> ( regexps -- conjunction )
[ <not> ] map <or> <not> ; foldable
: <zero-or-more> ( regexp -- regexp* )
[ "(" ")*" surround ]
[ <star> ] modify-regexp ; foldable
: <one-or-more> ( regexp -- regexp+ )
[ "(" ")+" surround ]
[ <plus> ] modify-regexp ; foldable
: <option> ( regexp -- regexp? )
[ "(" ")?" surround ]
[ <maybe> ] modify-regexp ; foldable

View File

@ -0,0 +1 @@
Combinators for creating regular expressions

View File

@ -0,0 +1 @@
parsing

View File

@ -0,0 +1,142 @@
! Copyright (C) 2009 Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: regexp.classes kernel sequences regexp.negation
quotations assocs fry math locals combinators
accessors words compiler.units kernel.private strings
sequences.private arrays call namespaces unicode.breaks
regexp.transition-tables combinators.short-circuit ;
IN: regexp.compiler
GENERIC: question>quot ( question -- quot )
SYMBOL: shortest?
SYMBOL: backwards?
<PRIVATE
M: t question>quot drop [ 2drop t ] ;
M: f question>quot drop [ 2drop f ] ;
M: not-class question>quot
class>> question>quot [ not ] compose ;
M: beginning-of-input question>quot
drop [ drop zero? ] ;
M: end-of-input question>quot
drop [ length = ] ;
M: end-of-file question>quot
drop [
{
[ length swap - 2 <= ]
[ swap tail { "\n" "\r\n" "\r" "" } member? ]
} 2&&
] ;
M: $ question>quot
drop [ { [ length = ] [ ?nth "\r\n" member? ] } 2|| ] ;
M: ^ question>quot
drop [ { [ drop zero? ] [ [ 1- ] dip ?nth "\r\n" member? ] } 2|| ] ;
M: word-break question>quot
drop [ word-break-at? ] ;
: (execution-quot) ( next-state -- quot )
! The conditions here are for lookaround and anchors, etc
dup condition? [
[ question>> question>quot ] [ yes>> ] [ no>> ] tri
[ (execution-quot) ] bi@
'[ 2dup @ _ _ if ]
] [ '[ _ execute ] ] if ;
: execution-quot ( next-state -- quot )
dup sequence? [ first ] when
(execution-quot) ;
TUPLE: box contents ;
C: <box> box
: condition>quot ( condition -- quot )
! Conditions here are for different classes
dup condition? [
[ question>> ] [ yes>> ] [ no>> ] tri
[ condition>quot ] bi@
'[ dup _ class-member? _ _ if ]
] [
contents>>
[ [ 3drop ] ] [ execution-quot '[ drop @ ] ] if-empty
] if ;
: non-literals>dispatch ( literals non-literals -- quot )
[ swap ] assoc-map ! we want state => predicate, and get the opposite as input
swap keys f assoc-answers
table>condition [ <box> ] condition-map condition>quot ;
: literals>cases ( literal-transitions -- case-body )
[ execution-quot ] assoc-map ;
: split-literals ( transitions -- case default )
{ } assoc-like [ first integer? ] partition
[ [ literals>cases ] keep ] dip non-literals>dispatch ;
:: step ( last-match index str quot final? direction -- last-index/f )
final? index last-match ?
index str bounds-check? [
index direction + str
index str nth-unsafe
quot call
] when ; inline
: direction ( -- n )
backwards? get -1 1 ? ;
: transitions>quot ( transitions final-state? -- quot )
dup shortest? get and [ 2drop [ drop nip ] ] [
[ split-literals swap case>quot ] dip direction
'[ { array-capacity string } declare _ _ _ step ]
] if ;
: word>quot ( word dfa -- quot )
[ transitions>> at ]
[ final-states>> key? ] 2bi
transitions>quot ;
: states>code ( words dfa -- )
[ ! with-compilation-unit doesn't compile, so we need call( -- )
[
'[
dup _ word>quot
(( last-match index string -- ? ))
define-declared
] each
] with-compilation-unit
] call( words dfa -- ) ;
: states>words ( dfa -- words dfa )
dup transitions>> keys [ gensym ] H{ } map>assoc
[ transitions-at ]
[ values ]
bi swap ;
: dfa>main-word ( dfa -- word )
states>words [ states>code ] keep start-state>> ;
PRIVATE>
: simple-define-temp ( quot effect -- word )
[ [ define-temp ] with-compilation-unit ] call( quot effect -- word ) ;
: dfa>word ( dfa -- quot )
dfa>main-word execution-quot '[ drop [ f ] 2dip @ ]
(( start-index string regexp -- i/f )) simple-define-temp ;
: dfa>shortest-word ( dfa -- word )
t shortest? [ dfa>word ] with-variable ;
: dfa>reverse-word ( dfa -- word )
t backwards? [ dfa>word ] with-variable ;
: dfa>reverse-shortest-word ( dfa -- word )
t backwards? [ dfa>shortest-word ] with-variable ;

View File

@ -0,0 +1,3 @@
USING: regexp.dfa tools.test ;
IN: regexp.dfa.tests

View File

@ -1,84 +1,84 @@
! Copyright (C) 2008 Doug Coleman.
! Copyright (C) 2008, 2009 Doug Coleman, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors arrays assocs combinators fry kernel locals
math math.order regexp.nfa regexp.transition-tables sequences
sets sorting vectors regexp.utils sequences.deep ;
USING: io prettyprint threads ;
sets sorting vectors regexp.ast regexp.classes ;
IN: regexp.dfa
: find-delta ( states transition regexp -- new-states )
nfa-table>> transitions>>
rot [ swap at at ] with with gather sift ;
: find-delta ( states transition nfa -- new-states )
transitions>> '[ _ swap _ at at ] gather sift ;
: (find-epsilon-closure) ( states regexp -- new-states )
eps swap find-delta ;
:: epsilon-loop ( state table nfa question -- )
state table at :> old-value
old-value question 2array <or-class> :> new-question
new-question old-value = [
new-question state table set-at
state nfa transitions>> at
[ drop tagged-epsilon? ] assoc-filter
[| trans to |
to [
table nfa
trans tag>> new-question 2array <and-class>
epsilon-loop
] each
] assoc-each
] unless ;
: find-epsilon-closure ( states regexp -- new-states )
'[ dup _ (find-epsilon-closure) union ] [ length ] while-changes
natural-sort ;
: epsilon-table ( states nfa -- table )
[ H{ } clone tuck ] dip
'[ _ _ t epsilon-loop ] each ;
: find-closure ( states transition regexp -- new-states )
[ find-delta ] 2keep nip find-epsilon-closure ;
: find-epsilon-closure ( states nfa -- dfa-state )
epsilon-table table>condition ;
: find-start-state ( regexp -- state )
[ nfa-table>> start-state>> 1vector ] keep find-epsilon-closure ;
: find-closure ( states transition nfa -- new-states )
[ find-delta ] keep find-epsilon-closure ;
: find-transitions ( seq1 regexp -- seq2 )
nfa-table>> transitions>>
[ at keys ] curry gather
eps swap remove ;
: find-start-state ( nfa -- state )
[ start-state>> 1array ] keep find-epsilon-closure ;
: add-todo-state ( state regexp -- )
2dup visited-states>> key? [
2drop
] [
[ visited-states>> conjoin ]
[ new-states>> push ] 2bi
: find-transitions ( dfa-state nfa -- next-dfa-state )
transitions>>
'[ _ at keys [ condition-states ] map concat ] gather
[ tagged-epsilon? not ] filter ;
: add-todo-state ( state visited-states new-states -- )
3dup drop key? [ 3drop ] [
[ conjoin ] [ push ] bi-curry* bi
] if ;
: new-transitions ( regexp -- )
dup new-states>> [
drop
] [
dupd pop dup pick find-transitions rot
[
[ [ find-closure ] 2keep nip dupd add-todo-state ] 3keep
[ swapd transition make-transition ] dip
dfa-table>> add-transition
] curry with each
new-transitions
: add-todo-states ( state/condition visited-states new-states -- )
[ condition-states ] 2dip
'[ _ _ add-todo-state ] each ;
:: new-transitions ( nfa dfa new-states visited-states -- nfa dfa )
new-states [ nfa dfa ] [
pop :> state
state dfa transitions>> maybe-initialize-key
state nfa find-transitions
[| trans |
state trans nfa find-closure :> new-state
new-state visited-states new-states add-todo-states
state new-state trans dfa set-transition
] each
nfa dfa new-states visited-states new-transitions
] if-empty ;
: states ( hashtable -- array )
[ keys ]
[ values [ values concat ] map concat append ] bi ;
: set-final-states ( nfa dfa -- )
[
[ final-states>> keys ]
[ transitions>> keys ] bi*
[ intersects? ] with filter
unique
] keep (>>final-states) ;
: set-final-states ( regexp -- )
dup
[ nfa-table>> final-states>> keys ]
[ dfa-table>> transitions>> states ] bi
[ intersects? ] with filter
: initialize-dfa ( nfa -- dfa )
<transition-table>
swap find-start-state >>start-state ;
swap dfa-table>> final-states>>
[ conjoin ] curry each ;
: set-initial-state ( regexp -- )
dup
[ dfa-table>> ] [ find-start-state ] bi
[ >>start-state drop ] keep
1vector >>new-states drop ;
: set-traversal-flags ( regexp -- )
dup
[ nfa-traversal-flags>> ]
[ dfa-table>> transitions>> keys ] bi
[ [ nip ] [ [ swap at ] with map concat ] 2bi ] with H{ } map>assoc
>>dfa-traversal-flags drop ;
: construct-dfa ( regexp -- )
{
[ set-initial-state ]
[ new-transitions ]
[ set-final-states ]
[ set-traversal-flags ]
} cleave ;
: construct-dfa ( nfa -- dfa )
dup initialize-dfa
dup start-state>> condition-states >vector
H{ } clone
new-transitions
[ set-final-states ] keep ;

View File

@ -0,0 +1,43 @@
! Copyright (C) 2009 Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel accessors regexp.classes math.bits assocs sequences
arrays sets regexp.dfa math fry regexp.minimize regexp.ast regexp.transition-tables ;
IN: regexp.disambiguate
TUPLE: parts in out ;
: make-partition ( choices classes -- partition )
zip [ first ] partition [ values ] bi@ parts boa ;
: powerset-partition ( classes -- partitions )
[ length [ 2^ ] keep ] keep '[
_ <bits> _ make-partition
] map rest ;
: partition>class ( parts -- class )
[ out>> [ <not-class> ] map ]
[ in>> <and-class> ] bi
prefix <and-class> ;
: get-transitions ( partition state-transitions -- next-states )
[ in>> ] dip '[ _ at ] gather sift ;
: new-transitions ( transitions -- assoc ) ! assoc is class, partition
values [ keys ] gather
[ tagged-epsilon? not ] filter
powerset-partition
[ [ partition>class ] keep ] { } map>assoc
[ drop ] assoc-filter ;
: preserving-epsilon ( state-transitions quot -- new-state-transitions )
[ [ drop tagged-epsilon? ] assoc-filter ] bi
assoc-union H{ } assoc-like ; inline
: disambiguate ( nfa -- nfa )
expand-ors [
dup new-transitions '[
[
_ swap '[ _ get-transitions ] assoc-map
[ nip empty? not ] assoc-filter
] preserving-epsilon
] assoc-map
] change-transitions ;

View File

@ -0,0 +1,58 @@
! Copyright (C) 2009 Daniel Ehrenberg
! See http://factorcode.org/license.txt for BSD license.
USING: tools.test regexp.minimize assocs regexp
accessors regexp.transition-tables regexp.parser
regexp.classes regexp.negation ;
IN: regexp.minimize.tests
[ t ] [ 1 2 H{ { { 1 2 } t } } same-partition? ] unit-test
[ t ] [ 2 1 H{ { { 1 2 } t } } same-partition? ] unit-test
[ f ] [ 2 3 H{ { { 1 2 } t } } same-partition? ] unit-test
[ H{ { 1 1 } { 2 1 } { 3 3 } { 4 3 } } ]
[ { { 1 1 } { 1 2 } { 2 2 } { 3 3 } { 3 4 } { 4 4 } } [ t ] H{ } map>assoc partition>classes ] unit-test
[ { { 1 2 } { 3 4 } } ] [ H{ { "elephant" 1 } { "tiger" 3 } } H{ { "elephant" 2 } { "tiger" 4 } } assemble-values ] unit-test
: regexp-states ( string -- n )
parse-regexp ast>dfa transitions>> assoc-size ;
[ 3 ] [ "ab|ac" regexp-states ] unit-test
[ 3 ] [ "a(b|c)" regexp-states ] unit-test
[ 1 ] [ "((aa*)*)*" regexp-states ] unit-test
[ 1 ] [ "a|((aa*)*)*" regexp-states ] unit-test
[ 2 ] [ "ab|((aa*)*)*b" regexp-states ] unit-test
[ 4 ] [ "ab|cd" regexp-states ] unit-test
[ 1 ] [ "(?i:[a-z]*|[A-Z]*)" regexp-states ] unit-test
[
T{ transition-table
{ transitions H{
{ 0 H{ { CHAR: a 1 } { CHAR: b 1 } } }
{ 1 H{ { CHAR: a 2 } { CHAR: b 2 } } }
{ 2 H{ { CHAR: c 3 } } }
{ 3 H{ } }
} }
{ start-state 0 }
{ final-states H{ { 3 3 } } }
}
] [
T{ transition-table
{ transitions H{
{ 0 H{ { CHAR: a 1 } { CHAR: b 4 } } }
{ 1 H{ { CHAR: a 2 } { CHAR: b 5 } } }
{ 2 H{ { CHAR: c 3 } } }
{ 3 H{ } }
{ 4 H{ { CHAR: a 2 } { CHAR: b 5 } } }
{ 5 H{ { CHAR: c 6 } } }
{ 6 H{ } }
} }
{ start-state 0 }
{ final-states H{ { 3 3 } { 6 6 } } }
} combine-states
] unit-test
[ [ ] [ ] while-changes ] must-infer
[ H{ { T{ or-class f { 2 1 } } 3 } { 4 5 } } ]
[ H{ { 1 3 } { 2 3 } { 4 5 } } combine-state-transitions ] unit-test

View File

@ -0,0 +1,100 @@
! Copyright (C) 2009 Daniel Ehrenberg
! See http://factorcode.org/license.txt for BSD license.
USING: kernel sequences regexp.transition-tables fry assocs
accessors locals math sorting arrays sets hashtables regexp.dfa
combinators.short-circuit regexp.classes ;
IN: regexp.minimize
: table>state-numbers ( table -- assoc )
transitions>> keys <enum> [ swap ] H{ } assoc-map-as ;
: number-states ( table -- newtable )
dup table>state-numbers transitions-at ;
: has-conditions? ( assoc -- ? )
values [ condition? ] any? ;
: initially-same? ( s1 s2 transition-table -- ? )
{
[ drop <= ]
[ transitions>> '[ _ at keys ] bi@ set= ]
[ final-states>> '[ _ key? ] bi@ = ]
} 3&& ;
:: initialize-partitions ( transition-table -- partitions )
! Partition table is sorted-array => ?
H{ } clone :> out
transition-table transitions>> keys :> states
states [| s1 |
states [| s2 |
s1 s2 transition-table initially-same?
[ s1 s2 2array out conjoin ] when
] each
] each out ;
: same-partition? ( s1 s2 partitions -- ? )
{ [ [ 2array natural-sort ] dip key? ] [ drop = ] } 3|| ;
: assemble-values ( assoc1 assoc2 -- values )
dup keys '[ _ swap [ at ] curry map ] bi@ zip ;
: stay-same? ( s1 s2 transition partitions -- ? )
[ '[ _ transitions>> at ] bi@ assemble-values ] dip
'[ _ same-partition? ] assoc-all? ;
: partition-more ( partitions transition-table -- partitions )
over '[ drop first2 _ _ stay-same? ] assoc-filter ;
: partition>classes ( partitions -- synonyms ) ! old-state => new-state
>alist sort-keys
[ drop first2 swap ] assoc-map
<reversed>
>hashtable ;
:: (while-changes) ( obj quot: ( obj -- obj' ) comp: ( obj -- key ) old-key -- obj )
obj quot call :> new-obj
new-obj comp call :> new-key
new-key old-key =
[ new-obj ]
[ new-obj quot comp new-key (while-changes) ]
if ; inline recursive
: while-changes ( obj quot pred -- obj' )
3dup nip call (while-changes) ; inline
: (state-classes) ( transition-table -- partition )
[ initialize-partitions ] keep
'[ _ partition-more ] [ assoc-size ] while-changes ;
: assoc>set ( assoc -- keys-set )
[ drop dup ] assoc-map ;
: state-classes ( transition-table -- synonyms )
clone [ [ nip has-conditions? ] assoc-partition ] change-transitions
[ assoc>set ] [ (state-classes) partition>classes ] bi* assoc-union ;
: canonical-state? ( state transitions state-classes -- ? )
'[ dup _ at = ] swap '[ _ at has-conditions? ] bi or ;
: delete-duplicates ( transitions state-classes -- new-transitions )
dupd '[ drop _ _ canonical-state? ] assoc-filter ;
: combine-states ( table -- smaller-table )
dup state-classes
[ transitions-at ] keep
'[ _ delete-duplicates ] change-transitions ;
: combine-state-transitions ( hash -- hash )
H{ } clone tuck '[
_ [ 2array <or-class> ] change-at
] assoc-each [ swap ] assoc-map ;
: combine-transitions ( table -- table )
[ [ combine-state-transitions ] assoc-map ] change-transitions ;
: minimize ( table -- minimal-table )
clone
number-states
combine-states
combine-transitions
expand-ors ;

View File

@ -0,0 +1,27 @@
! Copyright (C) 2009 Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: tools.test regexp.negation regexp.transition-tables regexp.classes ;
IN: regexp.negation.tests
[
! R/ |[^a]|.+/
T{ transition-table
{ transitions H{
{ 0 H{ { CHAR: a 1 } { T{ not-class f CHAR: a } -1 } } }
{ 1 H{ { t -1 } } }
{ -1 H{ { t -1 } } }
} }
{ start-state 0 }
{ final-states H{ { 0 0 } { -1 -1 } } }
}
] [
! R/ a/
T{ transition-table
{ transitions H{
{ 0 H{ { CHAR: a 1 } } }
{ 1 H{ } }
} }
{ start-state 0 }
{ final-states H{ { 1 1 } } }
} negate-table
] unit-test

View File

@ -0,0 +1,53 @@
! Copyright (C) 2009 Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: regexp.nfa regexp.disambiguate kernel sequences
assocs regexp.classes hashtables accessors fry vectors
regexp.ast regexp.transition-tables regexp.minimize
regexp.dfa namespaces ;
IN: regexp.negation
CONSTANT: fail-state -1
: add-default-transition ( state's-transitions -- new-state's-transitions )
clone dup
[ [ fail-state ] dip keys [ <not-class> ] map <and-class> ] keep set-at ;
: fail-state-recurses ( transitions -- new-transitions )
clone dup
[ fail-state t associate fail-state ] dip set-at ;
: add-fail-state ( transitions -- new-transitions )
[ add-default-transition ] assoc-map
fail-state-recurses ;
: inverse-final-states ( transition-table -- final-states )
[ transitions>> assoc>set ] [ final-states>> ] bi assoc-diff ;
: negate-table ( transition-table -- transition-table )
clone
[ add-fail-state ] change-transitions
dup inverse-final-states >>final-states ;
: renumber-states ( transition-table -- transition-table )
dup transitions>> keys [ next-state ] H{ } map>assoc
transitions-at ;
: box-transitions ( transition-table -- transition-table )
[ [ [ 1vector ] assoc-map ] assoc-map ] change-transitions ;
: unify-final-state ( transition-table -- transition-table )
dup [ final-states>> keys ] keep
'[ -2 epsilon _ set-transition ] each
H{ { -2 -2 } } >>final-states ;
: adjoin-dfa ( transition-table -- start end )
unify-final-state renumber-states box-transitions
[ start-state>> ]
[ final-states>> keys first ]
[ nfa-table get [ transitions>> ] bi@ swap update ] tri ;
: ast>dfa ( parse-tree -- minimal-dfa )
construct-nfa disambiguate construct-dfa minimize ;
M: negation nfa-node ( node -- start end )
term>> ast>dfa negate-table adjoin-dfa ;

View File

@ -1,235 +1,153 @@
! Copyright (C) 2008 Doug Coleman.
! Copyright (C) 2008, 2009 Doug Coleman, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors arrays assocs grouping kernel regexp.backend
locals math namespaces regexp.parser sequences fry quotations
math.order math.ranges vectors unicode.categories regexp.utils
regexp.transition-tables words sets regexp.classes unicode.case.private ;
USING: accessors arrays assocs grouping kernel
locals math namespaces sequences fry quotations
math.order math.ranges vectors unicode.categories
regexp.transition-tables words sets hashtables combinators.short-circuit
unicode.case.private regexp.ast regexp.classes ;
IN: regexp.nfa
! This uses unicode.case.private for ch>upper and ch>lower
! but case-insensitive matching should be done by case-folding everything
! before processing starts
IN: regexp.nfa
ERROR: feature-is-broken feature ;
SYMBOL: option-stack
SYMBOL: negation-mode
: negated? ( -- ? ) negation-mode get 0 or odd? ;
SYMBOL: state
SINGLETON: eps
: next-state ( -- state )
state [ get ] [ inc ] bi ;
MIXIN: traversal-flag
SINGLETON: lookahead-on INSTANCE: lookahead-on traversal-flag
SINGLETON: lookahead-off INSTANCE: lookahead-off traversal-flag
SINGLETON: lookbehind-on INSTANCE: lookbehind-on traversal-flag
SINGLETON: lookbehind-off INSTANCE: lookbehind-off traversal-flag
SINGLETON: capture-group-on INSTANCE: capture-group-on traversal-flag
SINGLETON: capture-group-off INSTANCE: capture-group-off traversal-flag
SINGLETON: front-anchor INSTANCE: front-anchor traversal-flag
SINGLETON: back-anchor INSTANCE: back-anchor traversal-flag
SINGLETON: word-boundary INSTANCE: word-boundary traversal-flag
SYMBOL: nfa-table
: options ( -- obj ) current-regexp get options>> ;
: set-each ( keys value hashtable -- )
'[ _ swap _ set-at ] each ;
: option? ( obj -- ? ) options key? ;
: options>hash ( options -- hashtable )
H{ } clone [
[ [ on>> t ] dip set-each ]
[ [ off>> f ] dip set-each ] 2bi
] keep ;
: option-on ( obj -- ) options conjoin ;
: using-options ( options quot -- )
[ options>hash option-stack [ ?push ] change ] dip
call option-stack get pop* ; inline
: option-off ( obj -- ) options delete-at ;
: option? ( obj -- ? )
option-stack get assoc-stack ;
: next-state ( regexp -- state )
[ state>> ] [ [ 1+ ] change-state drop ] bi ;
GENERIC: nfa-node ( node -- start-state end-state )
: set-start-state ( regexp -- )
dup stack>> [
drop
] [
[ nfa-table>> ] [ pop first ] bi* >>start-state drop
] if-empty ;
: add-simple-entry ( obj -- start-state end-state )
[ next-state next-state 2dup ] dip
nfa-table get add-transition ;
GENERIC: nfa-node ( node -- )
: epsilon-transition ( source target -- )
epsilon nfa-table get add-transition ;
:: add-simple-entry ( obj class -- )
[let* | regexp [ current-regexp get ]
s0 [ regexp next-state ]
s1 [ regexp next-state ]
stack [ regexp stack>> ]
table [ regexp nfa-table>> ] |
negated? [
s0 f obj class make-transition table add-transition
s0 s1 <default-transition> table add-transition
] [
s0 s1 obj class make-transition table add-transition
] if
s0 s1 2array stack push
t s1 table final-states>> set-at ] ;
M:: star nfa-node ( node -- start end )
node term>> nfa-node :> s1 :> s0
next-state :> s2
next-state :> s3
s1 s0 epsilon-transition
s2 s0 epsilon-transition
s2 s3 epsilon-transition
s1 s3 epsilon-transition
s2 s3 ;
: add-traversal-flag ( flag -- )
stack peek second
current-regexp get nfa-traversal-flags>> push-at ;
GENERIC: modify-epsilon ( tag -- newtag )
! Potential off-by-one errors when lookaround nested in lookbehind
:: concatenate-nodes ( -- )
[let* | regexp [ current-regexp get ]
stack [ regexp stack>> ]
table [ regexp nfa-table>> ]
s2 [ stack peek first ]
s3 [ stack pop second ]
s0 [ stack peek first ]
s1 [ stack pop second ] |
s1 s2 eps <literal-transition> table add-transition
s1 table final-states>> delete-at
s0 s3 2array stack push ] ;
M: object modify-epsilon ;
:: alternate-nodes ( -- )
[let* | regexp [ current-regexp get ]
stack [ regexp stack>> ]
table [ regexp nfa-table>> ]
s2 [ stack peek first ]
s3 [ stack pop second ]
s0 [ stack peek first ]
s1 [ stack pop second ]
s4 [ regexp next-state ]
s5 [ regexp next-state ] |
s4 s0 eps <literal-transition> table add-transition
s4 s2 eps <literal-transition> table add-transition
s1 s5 eps <literal-transition> table add-transition
s3 s5 eps <literal-transition> table add-transition
s1 table final-states>> delete-at
s3 table final-states>> delete-at
t s5 table final-states>> set-at
s4 s5 2array stack push ] ;
M: $ modify-epsilon
multiline option? [ drop end-of-input ] unless ;
M: kleene-star nfa-node ( node -- )
term>> nfa-node
[let* | regexp [ current-regexp get ]
stack [ regexp stack>> ]
s0 [ stack peek first ]
s1 [ stack pop second ]
s2 [ regexp next-state ]
s3 [ regexp next-state ]
table [ regexp nfa-table>> ] |
s1 table final-states>> delete-at
t s3 table final-states>> set-at
s1 s0 eps <literal-transition> table add-transition
s2 s0 eps <literal-transition> table add-transition
s2 s3 eps <literal-transition> table add-transition
s1 s3 eps <literal-transition> table add-transition
s2 s3 2array stack push ] ;
M: ^ modify-epsilon
multiline option? [ drop beginning-of-input ] unless ;
M: concatenation nfa-node ( node -- )
seq>>
reversed-regexp option? [ <reversed> ] when
[ [ nfa-node ] each ]
[ length 1- [ concatenate-nodes ] times ] bi ;
M: tagged-epsilon nfa-node
clone [ modify-epsilon ] change-tag add-simple-entry ;
M: alternation nfa-node ( node -- )
seq>>
[ [ nfa-node ] each ]
[ length 1- [ alternate-nodes ] times ] bi ;
M: concatenation nfa-node ( node -- start end )
[ first>> ] [ second>> ] bi
reversed-regexp option? [ swap ] when
[ nfa-node ] bi@
[ epsilon-transition ] dip ;
M: constant nfa-node ( node -- )
:: alternate-nodes ( s0 s1 s2 s3 -- start end )
next-state :> s4
next-state :> s5
s4 s0 epsilon-transition
s4 s2 epsilon-transition
s1 s5 epsilon-transition
s3 s5 epsilon-transition
s4 s5 ;
M: alternation nfa-node ( node -- start end )
[ first>> ] [ second>> ] bi
[ nfa-node ] bi@
alternate-nodes ;
GENERIC: modify-class ( char-class -- char-class' )
M: object modify-class ;
M: integer modify-class
case-insensitive option? [
dup char>> [ ch>lower ] [ ch>upper ] bi
2dup = [
2drop
char>> literal-transition add-simple-entry
] [
[ literal-transition add-simple-entry ] bi@
alternate-nodes drop
] if
] [
char>> literal-transition add-simple-entry
] if ;
dup Letter? [
[ ch>lower ] [ ch>upper ] bi 2array <or-class>
] when
] when ;
M: epsilon nfa-node ( node -- )
drop eps literal-transition add-simple-entry ;
M: integer nfa-node ( node -- start end )
modify-class add-simple-entry ;
M: word nfa-node ( node -- ) class-transition add-simple-entry ;
M: primitive-class modify-class
class>> modify-class <primitive-class> ;
M: any-char nfa-node ( node -- )
[ dotall option? ] dip any-char-no-nl ?
class-transition add-simple-entry ;
M: or-class modify-class
seq>> [ modify-class ] map <or-class> ;
! M: beginning-of-text nfa-node ( node -- ) ;
M: not-class modify-class
class>> modify-class <not-class> ;
M: beginning-of-line nfa-node ( node -- ) class-transition add-simple-entry ;
M: any-char modify-class
drop dotall option? t any-char-no-nl ? ;
M: end-of-line nfa-node ( node -- ) class-transition add-simple-entry ;
: modify-letter-class ( class -- newclass )
case-insensitive option? [ drop Letter-class ] when ;
M: letter-class modify-class modify-letter-class ;
M: LETTER-class modify-class modify-letter-class ;
: choose-letter-class ( node -- node' )
case-insensitive option? Letter-class rot ? ;
: cased-range? ( range -- ? )
[ from>> ] [ to>> ] bi {
[ [ letter? ] bi@ and ]
[ [ LETTER? ] bi@ and ]
} 2|| ;
M: letter-class nfa-node ( node -- )
choose-letter-class class-transition add-simple-entry ;
M: LETTER-class nfa-node ( node -- )
choose-letter-class class-transition add-simple-entry ;
M: character-class-range nfa-node ( node -- )
M: range modify-class
case-insensitive option? [
! This should be implemented for Unicode by case-folding
! the input and all strings in the regexp.
dup [ from>> ] [ to>> ] bi
2dup [ Letter? ] bi@ and [
rot drop
[ [ ch>lower ] bi@ character-class-range boa ]
[ [ ch>upper ] bi@ character-class-range boa ] 2bi
[ class-transition add-simple-entry ] bi@
alternate-nodes
] [
2drop
class-transition add-simple-entry
] if
] [
class-transition add-simple-entry
] if ;
dup cased-range? [
[ from>> ] [ to>> ] bi
[ [ ch>lower ] bi@ <range> ]
[ [ ch>upper ] bi@ <range> ] 2bi
2array <or-class>
] when
] when ;
M: capture-group nfa-node ( node -- )
"capture-groups" feature-is-broken
eps literal-transition add-simple-entry
capture-group-on add-traversal-flag
term>> nfa-node
eps literal-transition add-simple-entry
capture-group-off add-traversal-flag
2 [ concatenate-nodes ] times ;
M: class nfa-node
modify-class add-simple-entry ;
! xyzzy
M: non-capture-group nfa-node ( node -- )
term>> nfa-node ;
M: with-options nfa-node ( node -- start end )
dup options>> [ tree>> nfa-node ] using-options ;
M: reluctant-kleene-star nfa-node ( node -- )
term>> <kleene-star> nfa-node ;
M: negation nfa-node ( node -- )
negation-mode inc
term>> nfa-node
negation-mode dec ;
M: lookahead nfa-node ( node -- )
"lookahead" feature-is-broken
eps literal-transition add-simple-entry
lookahead-on add-traversal-flag
term>> nfa-node
eps literal-transition add-simple-entry
lookahead-off add-traversal-flag
2 [ concatenate-nodes ] times ;
M: lookbehind nfa-node ( node -- )
"lookbehind" feature-is-broken
eps literal-transition add-simple-entry
lookbehind-on add-traversal-flag
term>> nfa-node
eps literal-transition add-simple-entry
lookbehind-off add-traversal-flag
2 [ concatenate-nodes ] times ;
M: option nfa-node ( node -- )
[ option>> ] [ on?>> ] bi [ option-on ] [ option-off ] if
eps literal-transition add-simple-entry ;
: construct-nfa ( regexp -- )
: construct-nfa ( ast -- nfa-table )
[
reset-regexp
negation-mode off
[ current-regexp set ]
[ parse-tree>> nfa-node ]
[ set-start-state ] tri
0 state set
<transition-table> nfa-table set
nfa-node
nfa-table get
swap dup associate >>final-states
swap >>start-state
] with-scope ;

View File

@ -1,34 +1,24 @@
USING: kernel tools.test regexp.backend regexp ;
IN: regexp.parser
USING: kernel tools.test regexp.parser fry sequences ;
IN: regexp.parser.tests
: test-regexp ( string -- )
default-regexp parse-regexp ;
: regexp-parses ( string -- )
[ [ ] ] dip '[ _ parse-regexp drop ] unit-test ;
! [ "(" ] [ unmatched-parentheses? ] must-fail-with
: regexp-fails ( string -- )
'[ _ parse-regexp ] must-fail ;
[ ] [ "a|b" test-regexp ] unit-test
[ ] [ "a.b" test-regexp ] unit-test
[ ] [ "a|b|c" test-regexp ] unit-test
[ ] [ "abc|b" test-regexp ] unit-test
[ ] [ "a|bcd" test-regexp ] unit-test
[ ] [ "a|(b)" test-regexp ] unit-test
[ ] [ "(a)|b" test-regexp ] unit-test
[ ] [ "(a|b)" test-regexp ] unit-test
[ ] [ "((a)|(b))" test-regexp ] unit-test
{
"a|b" "a.b" "a|b|c" "abc|b" "a|bcd" "a|(b)" "(?-i:a)" "||"
"(a)|b" "(a|b)" "((a)|(b))" "(?:a)" "(?i:a)" "|b" "b|"
"[abc]" "[a-c]" "[^a-c]" "[^]]" "[]a]" "[[]" "[]-a]" "[a-]" "[-]"
"[--a]" "foo*" "(foo)*" "(a|b)|c" "(foo){2,3}" "(foo){2,}"
"(foo){2}" "{2,3}" "{," "{,}" "}" "foo}" "[^]-a]" "[^-]a]"
"[a-]" "[^a-]" "[^a-]" "a{,2}" "(?#foobar)"
"\\p{Space}" "\\t" "\\[" "[\\]]" "\\P{Space}"
"\\ueeee" "\\0333" "\\xff" "\\\\" "\\w"
} [ regexp-parses ] each
[ ] [ "(?:a)" test-regexp ] unit-test
[ ] [ "(?i:a)" test-regexp ] unit-test
[ ] [ "(?-i:a)" test-regexp ] unit-test
[ "(?z:a)" test-regexp ] [ unknown-regexp-option? ] must-fail-with
[ "(?-z:a)" test-regexp ] [ unknown-regexp-option? ] must-fail-with
[ ] [ "(?=a)" test-regexp ] unit-test
[ ] [ "[abc]" test-regexp ] unit-test
[ ] [ "[a-c]" test-regexp ] unit-test
[ ] [ "[^a-c]" test-regexp ] unit-test
[ "[^]" test-regexp ] must-fail
[ ] [ "|b" test-regexp ] unit-test
[ ] [ "b|" test-regexp ] unit-test
[ ] [ "||" test-regexp ] unit-test
{
"[^]" "[]" "a{foo}" "a{,}" "a{}" "(?)" "\\p{foo}" "\\P{foo}"
"\\ueeeg" "\\0339" "\\xfg"
} [ regexp-fails ] each

View File

@ -1,437 +1,179 @@
! Copyright (C) 2008 Doug Coleman.
! Copyright (C) 2008, 2009 Doug Coleman, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors arrays assocs combinators io io.streams.string
kernel math math.parser namespaces sets
quotations sequences splitting vectors math.order
strings regexp.backend regexp.utils
unicode.case unicode.categories words locals regexp.classes ;
USING: peg.ebnf kernel math.parser sequences assocs arrays fry math
combinators regexp.classes strings splitting peg locals accessors
regexp.ast ;
IN: regexp.parser
FROM: math.ranges => [a,b] ;
: allowed-char? ( ch -- ? )
".()|[*+?$^" member? not ;
TUPLE: concatenation seq ; INSTANCE: concatenation node
TUPLE: alternation seq ; INSTANCE: alternation node
TUPLE: kleene-star term ; INSTANCE: kleene-star node
ERROR: bad-number ;
! !!!!!!!!
TUPLE: possessive-question term ; INSTANCE: possessive-question node
TUPLE: possessive-kleene-star term ; INSTANCE: possessive-kleene-star node
: ensure-number ( n -- n )
[ bad-number ] unless* ;
! !!!!!!!!
TUPLE: reluctant-question term ; INSTANCE: reluctant-question node
TUPLE: reluctant-kleene-star term ; INSTANCE: reluctant-kleene-star node
:: at-error ( key assoc quot: ( key -- replacement ) -- value )
key assoc at* [ drop key quot call ] unless ; inline
TUPLE: negation term ; INSTANCE: negation node
TUPLE: constant char ; INSTANCE: constant node
TUPLE: range from to ; INSTANCE: range node
ERROR: bad-class name ;
MIXIN: parentheses-group
TUPLE: lookahead term ; INSTANCE: lookahead node
INSTANCE: lookahead parentheses-group
TUPLE: lookbehind term ; INSTANCE: lookbehind node
INSTANCE: lookbehind parentheses-group
TUPLE: capture-group term ; INSTANCE: capture-group node
INSTANCE: capture-group parentheses-group
TUPLE: non-capture-group term ; INSTANCE: non-capture-group node
INSTANCE: non-capture-group parentheses-group
TUPLE: independent-group term ; INSTANCE: independent-group node ! atomic group
INSTANCE: independent-group parentheses-group
TUPLE: comment-group term ; INSTANCE: comment-group node
INSTANCE: comment-group parentheses-group
: name>class ( name -- class )
{
{ "Lower" letter-class }
{ "Upper" LETTER-class }
{ "Alpha" Letter-class }
{ "ASCII" ascii-class }
{ "Digit" digit-class }
{ "Alnum" alpha-class }
{ "Punct" punctuation-class }
{ "Graph" java-printable-class }
{ "Print" java-printable-class }
{ "Blank" non-newline-blank-class }
{ "Cntrl" control-character-class }
{ "XDigit" hex-digit-class }
{ "Space" java-blank-class }
! TODO: unicode-character-class
} [ bad-class ] at-error ;
SINGLETON: epsilon INSTANCE: epsilon node
: lookup-escape ( char -- ast )
{
{ CHAR: t [ CHAR: \t ] }
{ CHAR: n [ CHAR: \n ] }
{ CHAR: r [ CHAR: \r ] }
{ CHAR: f [ HEX: c ] }
{ CHAR: a [ HEX: 7 ] }
{ CHAR: e [ HEX: 1b ] }
{ CHAR: \\ [ CHAR: \\ ] }
TUPLE: option option on? ; INSTANCE: option node
{ CHAR: w [ c-identifier-class <primitive-class> ] }
{ CHAR: W [ c-identifier-class <primitive-class> <not-class> ] }
{ CHAR: s [ java-blank-class <primitive-class> ] }
{ CHAR: S [ java-blank-class <primitive-class> <not-class> ] }
{ CHAR: d [ digit-class <primitive-class> ] }
{ CHAR: D [ digit-class <primitive-class> <not-class> ] }
SINGLETONS: unix-lines dotall multiline comments case-insensitive
unicode-case reversed-regexp ;
{ CHAR: z [ end-of-input <tagged-epsilon> ] }
{ CHAR: Z [ end-of-file <tagged-epsilon> ] }
{ CHAR: A [ beginning-of-input <tagged-epsilon> ] }
{ CHAR: b [ word-break <tagged-epsilon> ] }
{ CHAR: B [ word-break <not-class> <tagged-epsilon> ] }
[ ]
} case ;
SINGLETONS: beginning-of-character-class end-of-character-class
left-parenthesis pipe caret dash ;
: push1 ( obj -- ) input-stream get stream>> push ;
: peek1 ( -- obj ) input-stream get stream>> [ f ] [ peek ] if-empty ;
: pop3 ( seq -- obj1 obj2 obj3 ) [ pop ] [ pop ] [ pop ] tri spin ;
: drop1 ( -- ) read1 drop ;
: stack ( -- obj ) current-regexp get stack>> ;
: change-whole-stack ( quot -- )
current-regexp get
[ stack>> swap call ] keep (>>stack) ; inline
: push-stack ( obj -- ) stack push ;
: pop-stack ( -- obj ) stack pop ;
: cut-out ( vector n -- vector' vector ) cut rest ;
ERROR: cut-stack-error ;
: cut-stack ( obj vector -- vector' vector )
[ nip ] [ last-index ] 2bi [ cut-stack-error ] unless* cut-out swap ;
: <possessive-kleene-star> ( obj -- kleene ) possessive-kleene-star boa ;
: <reluctant-kleene-star> ( obj -- kleene ) reluctant-kleene-star boa ;
: <possessive-question> ( obj -- kleene ) possessive-question boa ;
: <reluctant-question> ( obj -- kleene ) reluctant-question boa ;
: <negation> ( obj -- negation ) negation boa ;
: <concatenation> ( seq -- concatenation )
>vector [ epsilon ] [ concatenation boa ] if-empty ;
: <alternation> ( seq -- alternation ) >vector alternation boa ;
: <capture-group> ( obj -- capture-group ) capture-group boa ;
: <kleene-star> ( obj -- kleene-star ) kleene-star boa ;
: <constant> ( obj -- constant ) constant boa ;
: first|concatenation ( seq -- first/concatenation )
dup length 1 = [ first ] [ <concatenation> ] if ;
: first|alternation ( seq -- first/alternation )
dup length 1 = [ first ] [ <alternation> ] if ;
: <character-class-range> ( from to -- obj )
2dup <
[ character-class-range boa ] [ 2drop unmatchable-class ] if ;
ERROR: unmatched-parentheses ;
ERROR: unknown-regexp-option option ;
: options-assoc ( -- assoc )
H{
{ CHAR: i case-insensitive }
{ CHAR: d unix-lines }
{ CHAR: m multiline }
{ CHAR: n multiline }
{ CHAR: r reversed-regexp }
{ CHAR: s dotall }
{ CHAR: u unicode-case }
{ CHAR: x comments }
} ;
: ch>option ( ch -- singleton )
{
{ CHAR: i [ case-insensitive ] }
{ CHAR: d [ unix-lines ] }
{ CHAR: m [ multiline ] }
{ CHAR: n [ multiline ] }
{ CHAR: r [ reversed-regexp ] }
{ CHAR: s [ dotall ] }
{ CHAR: u [ unicode-case ] }
{ CHAR: x [ comments ] }
[ unknown-regexp-option ]
} case ;
options-assoc at ;
: option>ch ( option -- string )
{
{ case-insensitive [ CHAR: i ] }
{ multiline [ CHAR: m ] }
{ reversed-regexp [ CHAR: r ] }
{ dotall [ CHAR: s ] }
[ unknown-regexp-option ]
} case ;
options-assoc value-at ;
: toggle-option ( ch ? -- )
[ ch>option ] dip option boa push-stack ;
: parse-options ( on off -- options )
[ [ ch>option ] { } map-as ] bi@ <options> ;
: (parse-options) ( string ? -- ) [ toggle-option ] curry each ;
: string>options ( string -- options )
"-" split1 parse-options ;
: options>string ( options -- string )
[ on>> ] [ off>> ] bi
[ [ option>ch ] map ] bi@
[ "-" glue ] unless-empty
"" like ;
: parse-options ( string -- )
"-" split1 [ t (parse-options) ] [ f (parse-options) ] bi* ;
! TODO: add syntax for various parenthized things,
! add greedy and nongreedy forms of matching
! (once it's all implemented)
ERROR: bad-special-group string ;
EBNF: parse-regexp
DEFER: (parse-regexp)
: nested-parse-regexp ( token ? -- )
[ push-stack (parse-regexp) pop-stack ] dip
[ <negation> ] when pop-stack new swap >>term push-stack ;
CharacterInBracket = !("}") Character
! non-capturing groups
: (parse-special-group) ( -- )
read1 {
{ [ dup CHAR: # = ] ! comment
[ drop comment-group f nested-parse-regexp pop-stack drop ] }
{ [ dup CHAR: : = ]
[ drop non-capture-group f nested-parse-regexp ] }
{ [ dup CHAR: = = ]
[ drop lookahead f nested-parse-regexp ] }
{ [ dup CHAR: ! = ]
[ drop lookahead t nested-parse-regexp ] }
{ [ dup CHAR: > = ]
[ drop non-capture-group f nested-parse-regexp ] }
{ [ dup CHAR: < = peek1 CHAR: = = and ]
[ drop drop1 lookbehind f nested-parse-regexp ] }
{ [ dup CHAR: < = peek1 CHAR: ! = and ]
[ drop drop1 lookbehind t nested-parse-regexp ] }
[
":)" read-until
[ swap prefix ] dip
{
{ CHAR: : [ parse-options non-capture-group f nested-parse-regexp ] }
{ CHAR: ) [ parse-options ] }
[ drop bad-special-group ]
} case
]
} cond ;
QuotedCharacter = !("\\E") .
: handle-left-parenthesis ( -- )
peek1 CHAR: ? =
[ drop1 (parse-special-group) ]
[ capture-group f nested-parse-regexp ] if ;
Escape = "p{" CharacterInBracket*:s "}" => [[ s >string name>class <primitive-class> ]]
| "P{" CharacterInBracket*:s "}" => [[ s >string name>class <primitive-class> <negation> ]]
| "Q" QuotedCharacter*:s "\\E" => [[ s <concatenation> ]]
| "u" Character:a Character:b Character:c Character:d
=> [[ { a b c d } hex> ensure-number ]]
| "x" Character:a Character:b
=> [[ { a b } hex> ensure-number ]]
| "0" Character:a Character:b Character:c
=> [[ { a b c } oct> ensure-number ]]
| . => [[ lookup-escape ]]
: handle-dot ( -- ) any-char push-stack ;
: handle-pipe ( -- ) pipe push-stack ;
: (handle-star) ( obj -- kleene-star )
peek1 {
{ CHAR: + [ drop1 <possessive-kleene-star> ] }
{ CHAR: ? [ drop1 <reluctant-kleene-star> ] }
[ drop <kleene-star> ]
} case ;
: handle-star ( -- ) stack pop (handle-star) push-stack ;
: handle-question ( -- )
stack pop peek1 {
{ CHAR: + [ drop1 <possessive-question> ] }
{ CHAR: ? [ drop1 <reluctant-question> ] }
[ drop epsilon 2array <alternation> ]
} case push-stack ;
: handle-plus ( -- )
stack pop dup (handle-star)
2array <concatenation> push-stack ;
EscapeSequence = "\\" Escape:e => [[ e ]]
ERROR: unmatched-brace ;
: parse-repetition ( -- start finish ? )
"}" read-until [ unmatched-brace ] unless
[ "," split1 [ string>number ] bi@ ]
[ CHAR: , swap index >boolean ] bi ;
Character = EscapeSequence
| "$" => [[ $ <tagged-epsilon> ]]
| "^" => [[ ^ <tagged-epsilon> ]]
| . ?[ allowed-char? ]?
: replicate/concatenate ( n obj -- obj' )
over zero? [ 2drop epsilon ]
[ <repetition> first|concatenation ] if ;
AnyRangeCharacter = EscapeSequence | .
: exactly-n ( n -- )
stack pop replicate/concatenate push-stack ;
RangeCharacter = !("]") AnyRangeCharacter
: at-least-n ( n -- )
stack pop
[ replicate/concatenate ] keep
<kleene-star> 2array <concatenation> push-stack ;
Range = RangeCharacter:a "-" RangeCharacter:b => [[ a b <range> ]]
| RangeCharacter
: at-most-n ( n -- )
1+
stack pop
[ replicate/concatenate ] curry map <alternation> push-stack ;
StartRange = AnyRangeCharacter:a "-" RangeCharacter:b => [[ a b <range> ]]
| AnyRangeCharacter
: from-m-to-n ( m n -- )
[a,b]
stack pop
[ replicate/concatenate ] curry map
<alternation> push-stack ;
Ranges = StartRange:s Range*:r => [[ r s prefix ]]
ERROR: invalid-range a b ;
CharClass = "^"?:n Ranges:e => [[ e n char-class ]]
: handle-left-brace ( -- )
parse-repetition
[ 2dup [ [ 0 < [ invalid-range ] when ] when* ] bi@ ] dip
[
2dup and [ from-m-to-n ]
[ [ nip at-most-n ] [ at-least-n ] if* ] if
] [ drop 0 max exactly-n ] if ;
Options = [idmsux]*
: handle-front-anchor ( -- ) beginning-of-line push-stack ;
: handle-back-anchor ( -- ) end-of-line push-stack ;
Parenthized = "?:" Alternation:a => [[ a ]]
| "?" Options:on "-"? Options:off ":" Alternation:a
=> [[ a on off parse-options <with-options> ]]
| "?#" [^)]* => [[ f ]]
| "?~" Alternation:a => [[ a <negation> ]]
| "?=" Alternation:a => [[ a <lookahead> <tagged-epsilon> ]]
| "?!" Alternation:a => [[ a <lookahead> <not-class> <tagged-epsilon> ]]
| "?<=" Alternation:a => [[ a <lookbehind> <tagged-epsilon> ]]
| "?<!" Alternation:a => [[ a <lookbehind> <not-class> <tagged-epsilon> ]]
| Alternation
ERROR: bad-character-class obj ;
ERROR: expected-posix-class ;
Element = "(" Parenthized:p ")" => [[ p ]]
| "[" CharClass:r "]" => [[ r ]]
| ".":d => [[ any-char <primitive-class> ]]
| Character
: parse-posix-class ( -- obj )
read1 CHAR: { = [ expected-posix-class ] unless
"}" read-until [ bad-character-class ] unless
{
{ "Lower" [ letter-class ] }
{ "Upper" [ LETTER-class ] }
{ "Alpha" [ Letter-class ] }
{ "ASCII" [ ascii-class ] }
{ "Digit" [ digit-class ] }
{ "Alnum" [ alpha-class ] }
{ "Punct" [ punctuation-class ] }
{ "Graph" [ java-printable-class ] }
{ "Print" [ java-printable-class ] }
{ "Blank" [ non-newline-blank-class ] }
{ "Cntrl" [ control-character-class ] }
{ "XDigit" [ hex-digit-class ] }
{ "Space" [ java-blank-class ] }
! TODO: unicode-character-class, fallthrough in unicode is bad-char-clss
[ bad-character-class ]
} case ;
Number = (!(","|"}").)* => [[ string>number ensure-number ]]
: parse-octal ( -- n ) 3 read oct> check-octal ;
: parse-short-hex ( -- n ) 2 read hex> check-hex ;
: parse-long-hex ( -- n ) 6 read hex> check-hex ;
: parse-control-character ( -- n ) read1 ;
Times = "," Number:n "}" => [[ 0 n <from-to> ]]
| Number:n ",}" => [[ n <at-least> ]]
| Number:n "}" => [[ n n <from-to> ]]
| "}" => [[ bad-number ]]
| Number:n "," Number:m "}" => [[ n m <from-to> ]]
ERROR: bad-escaped-literals seq ;
Repeated = Element:e "{" Times:t => [[ e t <times> ]]
| Element:e "??" => [[ e <maybe> ]]
| Element:e "*?" => [[ e <star> ]]
| Element:e "+?" => [[ e <plus> ]]
| Element:e "?" => [[ e <maybe> ]]
| Element:e "*" => [[ e <star> ]]
| Element:e "+" => [[ e <plus> ]]
| Element
: parse-til-E ( -- obj )
"\\E" read-until [ bad-escaped-literals ] unless ;
:: (parse-escaped-literals) ( quot: ( obj -- obj' ) -- obj )
parse-til-E
drop1
[ epsilon ] [
quot call [ <constant> ] V{ } map-as
first|concatenation
] if-empty ; inline
Concatenation = Repeated*:r => [[ r sift <concatenation> ]]
: parse-escaped-literals ( -- obj )
[ ] (parse-escaped-literals) ;
Alternation = Concatenation:c ("|" Concatenation)*:a
=> [[ a empty? [ c ] [ a values c prefix <alternation> ] if ]]
: lower-case-literals ( -- obj )
[ >lower ] (parse-escaped-literals) ;
End = !(.)
: upper-case-literals ( -- obj )
[ >upper ] (parse-escaped-literals) ;
: parse-escaped ( -- obj )
read1
{
{ CHAR: t [ CHAR: \t <constant> ] }
{ CHAR: n [ CHAR: \n <constant> ] }
{ CHAR: r [ CHAR: \r <constant> ] }
{ CHAR: f [ HEX: c <constant> ] }
{ CHAR: a [ HEX: 7 <constant> ] }
{ CHAR: e [ HEX: 1b <constant> ] }
{ CHAR: w [ c-identifier-class ] }
{ CHAR: W [ c-identifier-class <negation> ] }
{ CHAR: s [ java-blank-class ] }
{ CHAR: S [ java-blank-class <negation> ] }
{ CHAR: d [ digit-class ] }
{ CHAR: D [ digit-class <negation> ] }
{ CHAR: p [ parse-posix-class ] }
{ CHAR: P [ parse-posix-class <negation> ] }
{ CHAR: x [ parse-short-hex <constant> ] }
{ CHAR: u [ parse-long-hex <constant> ] }
{ CHAR: 0 [ parse-octal <constant> ] }
{ CHAR: c [ parse-control-character ] }
{ CHAR: Q [ parse-escaped-literals ] }
! { CHAR: b [ word-boundary-class ] }
! { CHAR: B [ word-boundary-class <negation> ] }
! { CHAR: A [ handle-beginning-of-input ] }
! { CHAR: z [ handle-end-of-input ] }
! { CHAR: Z [ handle-end-of-input ] } ! plus a final terminator
! m//g mode
! { CHAR: G [ end of previous match ] }
! Group capture
! { CHAR: 1 [ CHAR: 1 <constant> ] }
! { CHAR: 2 [ CHAR: 2 <constant> ] }
! { CHAR: 3 [ CHAR: 3 <constant> ] }
! { CHAR: 4 [ CHAR: 4 <constant> ] }
! { CHAR: 5 [ CHAR: 5 <constant> ] }
! { CHAR: 6 [ CHAR: 6 <constant> ] }
! { CHAR: 7 [ CHAR: 7 <constant> ] }
! { CHAR: 8 [ CHAR: 8 <constant> ] }
! { CHAR: 9 [ CHAR: 9 <constant> ] }
! Perl extensions
! can't do \l and \u because \u is already a 4-hex
{ CHAR: L [ lower-case-literals ] }
{ CHAR: U [ upper-case-literals ] }
[ <constant> ]
} case ;
: handle-escape ( -- ) parse-escaped push-stack ;
: handle-dash ( vector -- vector' )
H{ { dash CHAR: - } } substitute ;
: character-class>alternation ( seq -- alternation )
[ dup number? [ <constant> ] when ] map first|alternation ;
: handle-caret ( vector -- vector' )
dup [ length 2 >= ] [ first caret eq? ] bi and [
rest-slice character-class>alternation <negation>
] [
character-class>alternation
] if ;
: make-character-class ( -- character-class )
[ beginning-of-character-class swap cut-stack ] change-whole-stack
handle-dash handle-caret ;
: apply-dash ( -- )
stack [ pop3 nip <character-class-range> ] keep push ;
: apply-dash? ( -- ? )
stack dup length 3 >=
[ [ length 2 - ] keep nth dash eq? ] [ drop f ] if ;
ERROR: empty-negated-character-class ;
DEFER: handle-left-bracket
: (parse-character-class) ( -- )
read1 [ empty-negated-character-class ] unless* {
{ CHAR: [ [ handle-left-bracket t ] }
{ CHAR: ] [ make-character-class push-stack f ] }
{ CHAR: - [ dash push-stack t ] }
{ CHAR: \ [ parse-escaped push-stack t ] }
[ push-stack apply-dash? [ apply-dash ] when t ]
} case
[ (parse-character-class) ] when ;
: push-constant ( ch -- ) <constant> push-stack ;
: parse-character-class-second ( -- )
read1 {
{ CHAR: [ [ CHAR: [ push-constant ] }
{ CHAR: ] [ CHAR: ] push-constant ] }
{ CHAR: - [ CHAR: - push-constant ] }
[ push1 ]
} case ;
: parse-character-class-first ( -- )
read1 {
{ CHAR: ^ [ caret push-stack parse-character-class-second ] }
{ CHAR: [ [ CHAR: [ push-constant ] }
{ CHAR: ] [ CHAR: ] push-constant ] }
{ CHAR: - [ CHAR: - push-constant ] }
[ push1 ]
} case ;
: handle-left-bracket ( -- )
beginning-of-character-class push-stack
parse-character-class-first (parse-character-class) ;
: finish-regexp-parse ( stack -- obj )
{ pipe } split
[ first|concatenation ] map first|alternation ;
: handle-right-parenthesis ( -- )
stack dup [ parentheses-group "members" word-prop member? ] find-last
-rot cut rest
[ [ push ] keep current-regexp get (>>stack) ]
[ finish-regexp-parse push-stack ] bi* ;
: parse-regexp-token ( token -- ? )
{
{ CHAR: ( [ handle-left-parenthesis t ] } ! handle (?..) at beginning?
{ CHAR: ) [ handle-right-parenthesis f ] }
{ CHAR: . [ handle-dot t ] }
{ CHAR: | [ handle-pipe t ] }
{ CHAR: ? [ handle-question t ] }
{ CHAR: * [ handle-star t ] }
{ CHAR: + [ handle-plus t ] }
{ CHAR: { [ handle-left-brace t ] }
{ CHAR: [ [ handle-left-bracket t ] }
{ CHAR: \ [ handle-escape t ] }
[
dup CHAR: $ = peek1 f = and
[ drop handle-back-anchor f ]
[ push-constant t ] if
]
} case ;
: (parse-regexp) ( -- )
read1 [ parse-regexp-token [ (parse-regexp) ] when ] when* ;
: parse-regexp-beginning ( -- )
peek1 CHAR: ^ = [ drop1 handle-front-anchor ] when ;
: parse-regexp ( regexp -- )
dup current-regexp [
raw>> [
<string-reader> [
parse-regexp-beginning (parse-regexp)
] with-input-stream
] unless-empty
current-regexp get [ finish-regexp-parse ] change-stack
dup stack>> >>parse-tree drop
] with-variable ;
Main = Alternation End
;EBNF

View File

@ -1,8 +1,92 @@
! Copyright (C) 2008 Doug Coleman.
! Copyright (C) 2008, 2009 Doug Coleman, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel strings help.markup help.syntax regexp.backend ;
USING: kernel strings help.markup help.syntax math ;
IN: regexp
ABOUT: "regexp"
ARTICLE: "regexp" "Regular expressions"
"The " { $vocab-link "regexp" } " vocabulary provides word for creating and using regular expressions."
{ $subsection { "regexp" "syntax" } }
{ $subsection { "regexp" "construction" } }
{ $vocab-subsection "regexp.combinators" "Regular expression combinators" }
{ $subsection { "regexp" "operations" } }
{ $subsection regexp }
{ $subsection { "regexp" "theory" } } ;
ARTICLE: { "regexp" "construction" } "Constructing regular expressions"
"Words which are useful for creating regular expressions:"
{ $subsection POSTPONE: R/ }
{ $subsection <regexp> }
{ $subsection <optioned-regexp> }
{ $heading "See also" }
{ $vocab-link "regexp.combinators" } ;
ARTICLE: { "regexp" "syntax" } "Regular expression syntax"
"Regexp syntax is largely compatible with Perl, Java and extended POSIX regexps, but not completely." $nl
"A new addition is the inclusion of a negation operator, with the syntax " { $snippet "(?~foo)" } " to match everything that does not match " { $snippet "foo" } "." $nl
"One missing feature is backreferences. This is because of a design decision to allow only regular expressions following the formal theory of regular languages. For more information, see " { $link { "regexp" "theory" } } ". You can create a new regular expression to match a particular string using " { $vocab-link "regexp.combinators" } " and group capture is available to extract parts of a regular expression match." $nl
"A distinction from Perl is that " { $snippet "\\G" } ", which references the previous match, is not included. This is because that sequence is inherently stateful, and Factor regexps don't hold state." $nl
"Additionally, none of the operations which embed code into a regexp are supported, as this would require the inclusion of the Factor parser and compiler in any application which wants to expose regexps to the user. None of the casing operations are included, for simplicity." ; ! Also describe syntax, from the beginning
ARTICLE: { "regexp" "theory" } "The theory of regular expressions"
"Far from being just a practical tool invented by Unix hackers, regular expressions were studied formally before computer programs were written to process them." $nl
"A regular language is a set of strings that is matched by a regular expression, which is defined to have characters and the empty string, along with the operations concatenation, disjunction and Kleene star. Another way to define the class of regular languages is as the class of languages which can be recognized with constant space overhead, ie with a DFA. These two definitions are provably equivalent." $nl
"One basic result in the theory of regular language is that the complement of a regular language is regular. In other words, for any regular expression, there exists another regular expression which matches exactly the strings that the first one doesn't match." $nl
"This implies, by DeMorgan's law, that, if you have two regular languages, their intersection is also regular. That is, for any two regular expressions, there exists a regular expression which matches strings that match both inputs." $nl
"Traditionally, regular expressions on computer support an additional operation: backreferences. For example, the Perl regexp " { $snippet "/(.*)$1/" } " matches a string repated twice. If a backreference refers to a string with a predetermined maximum length, then the resulting language is still regular." $nl
"But, if not, the language is not regular. There is strong evidence that there is no efficient way to parse with backreferences in the general case. Perl uses a naive backtracking algorithm which has pathological behavior in some cases, taking exponential time to match even if backreferences aren't used. Additionally, expressions with backreferences don't have the properties with negation and intersection described above." $nl
"The Factor regular expression engine was built with the design decision to support negation and intersection at the expense of backreferences. This lets us have a guaranteed linear-time matching algorithm. Systems like Ragel and Lex also use this algorithm, but in the Factor regular expression engine, all other features of regexps are still present." ;
ARTICLE: { "regexp" "operations" } "Matching operations with regular expressions"
{ $subsection matches? }
{ $subsection re-contains? }
{ $subsection first-match }
{ $subsection all-matching-slices }
{ $subsection all-matching-subseqs }
{ $subsection re-split }
{ $subsection re-replace }
{ $subsection count-matches } ;
HELP: <regexp>
{ $values { "string" string } { "regexp" regexp } }
{ $description "Compiles a regular expression into a DFA and returns this object. Regular expressions only have to be compiled once and can then be used multiple times to match input strings." } ;
{ $description "Creates a regular expression object, given a string in regular expression syntax. When it is first used for matching, a DFA is compiled, and this DFA is stored for reuse so it is only compiled once." } ;
HELP: <optioned-regexp>
{ $values { "string" string } { "options" string } { "regexp" regexp } }
{ $description "Given a string in regular expression syntax, and a string of options, creates a regular expression object. When it is first used for matching, a DFA is compiled, and this DFA is stored for reuse so it is only compiled once." } ;
HELP: R/
{ $syntax "R/ foo.*|[a-zA-Z]bar/i" }
{ $description "Literal syntax for a regular expression. When this syntax is used, the DFA is compiled at compile-time, rather than on first use." } ;
HELP: regexp
{ $class-description "The class of regular expressions. To construct these, see " { $link { "regexp" "construction" } } "." } ;
HELP: matches?
{ $values { "string" string } { "regexp" regexp } { "?" "a boolean" } }
{ $description "Tests if the string as a whole matches the given regular expression." } ;
HELP: all-matching-slices
{ $values { "string" string } { "regexp" regexp } { "seq" "a sequence of slices of the input" } }
{ $description "Finds a sequence of disjoint substrings which each match the pattern. It chooses this by finding the leftmost longest match, and then the leftmost longest match which starts after the end of the previous match, and so on." } ;
HELP: count-matches
{ $values { "string" string } { "regexp" regexp } { "n" integer } }
{ $description "Counts how many disjoint matches the regexp has in the string, as made unambiguous by " { $link all-matching-slices } "." } ;
HELP: re-split
{ $values { "string" string } { "regexp" regexp } { "seq" "a sequence of slices of the input" } }
{ $description "Splits the input string into chunks separated by the regular expression. Each chunk contains no match of the regexp. The chunks are chosen by the strategy of " { $link all-matching-slices } "." } ;
HELP: re-replace
{ $values { "string" string } { "regexp" regexp } { "replacement" string } { "result" string } }
{ $description "Replaces substrings which match the input regexp with the given replacement text. The boundaries of the substring are chosen by the strategy used by " { $link all-matching-slices } "." } ;
HELP: first-match
{ $values { "string" string } { "regexp" regexp } { "slice/f" "the match, if one exists" } }
{ $description "Finds the first match of the regular expression in the string, and returns it as a slice. If there is no match, then " { $link f } " is returned." } ;
HELP: re-contains?
{ $values { "string" string } { "regexp" regexp } { "?" "a boolean" } }
{ $description "Determines whether the string has a substring which matches the regular expression given." } ;

View File

@ -1,8 +1,11 @@
USING: regexp tools.test kernel sequences regexp.parser
regexp.traversal eval strings multiline ;
! Copyright (C) 2008, 2009 Doug Coleman, Daniel Ehrenberg
! See http://factorcode.org/license.txt for BSD license.
USING: regexp tools.test kernel sequences regexp.parser regexp.private
eval strings multiline accessors ;
IN: regexp-tests
\ <regexp> must-infer
\ compile-regexp must-infer
\ matches? must-infer
[ f ] [ "b" "a*" <regexp> matches? ] unit-test
@ -21,8 +24,8 @@ IN: regexp-tests
[ t ] [ "b" "b|" <regexp> matches? ] unit-test
[ t ] [ "" "b|" <regexp> matches? ] unit-test
[ t ] [ "" "b|" <regexp> matches? ] unit-test
[ f ] [ "" "|" <regexp> matches? ] unit-test
[ f ] [ "" "|||||||" <regexp> matches? ] unit-test
[ t ] [ "" "|" <regexp> matches? ] unit-test
[ t ] [ "" "|||||||" <regexp> matches? ] unit-test
[ f ] [ "aa" "a|b|c" <regexp> matches? ] unit-test
[ f ] [ "bb" "a|b|c" <regexp> matches? ] unit-test
@ -44,9 +47,9 @@ IN: regexp-tests
! Dotall mode -- when on, . matches newlines.
! Off by default.
[ f ] [ "\n" "." <regexp> matches? ] unit-test
[ t ] [ "\n" "(?s)." <regexp> matches? ] unit-test
[ t ] [ "\n" "(?s:.)" <regexp> matches? ] unit-test
[ t ] [ "\n" R/ ./s matches? ] unit-test
[ f ] [ "\n\n" "(?s).(?-s)." <regexp> matches? ] unit-test
[ f ] [ "\n\n" "(?s:.)." <regexp> matches? ] unit-test
[ f ] [ "" ".+" <regexp> matches? ] unit-test
[ t ] [ "a" ".+" <regexp> matches? ] unit-test
@ -76,8 +79,6 @@ IN: regexp-tests
[ t ] [ "bar" "foo|bar" <regexp> matches? ] unit-test
[ f ] [ "foobar" "foo|bar" <regexp> matches? ] unit-test
/*
! FIXME
[ f ] [ "" "(a)" <regexp> matches? ] unit-test
[ t ] [ "a" "(a)" <regexp> matches? ] unit-test
[ f ] [ "aa" "(a)" <regexp> matches? ] unit-test
@ -85,7 +86,6 @@ IN: regexp-tests
[ f ] [ "aababaaabbac" "(a|b)+" <regexp> matches? ] unit-test
[ t ] [ "ababaaabba" "(a|b)+" <regexp> matches? ] unit-test
*/
[ f ] [ "" "a{1}" <regexp> matches? ] unit-test
[ t ] [ "a" "a{1}" <regexp> matches? ] unit-test
@ -168,12 +168,9 @@ IN: regexp-tests
[ f ] [ "0" "[^\\d]" <regexp> matches? ] unit-test
[ t ] [ "a" "[^\\d]" <regexp> matches? ] unit-test
/*
! FIXME
[ t ] [ "a" "[a-z]{1,}|[A-Z]{2,4}|b*|c|(f|g)*" <regexp> matches? ] unit-test
[ t ] [ "a" "[a-z]{1,2}|[A-Z]{3,3}|b*|c|(f|g)*" <regexp> matches? ] unit-test
[ t ] [ "a" "[a-z]{1,2}|[A-Z]{3,3}" <regexp> matches? ] unit-test
*/
[ t ] [ "1000" "\\d{4,6}" <regexp> matches? ] unit-test
[ t ] [ "1000" "[0-9]{4,6}" <regexp> matches? ] unit-test
@ -185,7 +182,7 @@ IN: regexp-tests
[ f ] [ "abc" "[\\p{Upper}]{3}" <regexp> matches? ] unit-test
[ t ] [ "ABC" "[\\p{Upper}]{3}" <regexp> matches? ] unit-test
[ f ] [ "" "\\Q\\E" <regexp> matches? ] unit-test
[ t ] [ "" "\\Q\\E" <regexp> matches? ] unit-test
[ f ] [ "a" "\\Q\\E" <regexp> matches? ] unit-test
[ t ] [ "|*+" "\\Q|*+\\E" <regexp> matches? ] unit-test
[ f ] [ "abc" "\\Q|*+\\E" <regexp> matches? ] unit-test
@ -195,8 +192,8 @@ IN: regexp-tests
[ t ] [ "SXY" "\\0123XY" <regexp> matches? ] unit-test
[ t ] [ "x" "\\x78" <regexp> matches? ] unit-test
[ f ] [ "y" "\\x78" <regexp> matches? ] unit-test
[ t ] [ "x" "\\u000078" <regexp> matches? ] unit-test
[ f ] [ "y" "\\u000078" <regexp> matches? ] unit-test
[ t ] [ "x" "\\u0078" <regexp> matches? ] unit-test
[ f ] [ "y" "\\u0078" <regexp> matches? ] unit-test
[ t ] [ "ab" "a+b" <regexp> matches? ] unit-test
[ f ] [ "b" "a+b" <regexp> matches? ] unit-test
@ -214,8 +211,8 @@ IN: regexp-tests
[ f ] [ "aaaxb" "a+ab" <regexp> matches? ] unit-test
[ t ] [ "aaacb" "a+cb" <regexp> matches? ] unit-test
[ 3 ] [ "aaacb" "a*" <regexp> match-head ] unit-test
[ 2 ] [ "aaacb" "aa?" <regexp> match-head ] unit-test
[ "aaa" ] [ "aaacb" "a*" <regexp> first-match >string ] unit-test
[ "aa" ] [ "aaacb" "aa?" <regexp> first-match >string ] unit-test
[ t ] [ "aaa" R/ AAA/i matches? ] unit-test
[ f ] [ "aax" R/ AAA/i matches? ] unit-test
@ -226,15 +223,15 @@ IN: regexp-tests
[ t ] [ "c" R/ [A-Z]/i matches? ] unit-test
[ f ] [ "3" R/ [A-Z]/i matches? ] unit-test
[ t ] [ "a" "(?i)a" <regexp> matches? ] unit-test
[ t ] [ "a" "(?i)a" <regexp> matches? ] unit-test
[ t ] [ "A" "(?i)a" <regexp> matches? ] unit-test
[ t ] [ "A" "(?i)a" <regexp> matches? ] unit-test
[ t ] [ "a" "(?i:a)" <regexp> matches? ] unit-test
[ t ] [ "a" "(?i:a)" <regexp> matches? ] unit-test
[ t ] [ "A" "(?i:a)" <regexp> matches? ] unit-test
[ t ] [ "A" "(?i:a)" <regexp> matches? ] unit-test
[ t ] [ "a" R/ (?-i)a/i matches? ] unit-test
[ t ] [ "a" R/ (?-i)a/i matches? ] unit-test
[ f ] [ "A" R/ (?-i)a/i matches? ] unit-test
[ f ] [ "A" R/ (?-i)a/i matches? ] unit-test
[ t ] [ "a" R/ (?-i:a)/i matches? ] unit-test
[ t ] [ "a" R/ (?-i:a)/i matches? ] unit-test
[ f ] [ "A" R/ (?-i:a)/i matches? ] unit-test
[ f ] [ "A" R/ (?-i:a)/i matches? ] unit-test
[ f ] [ "A" "[a-z]" <regexp> matches? ] unit-test
[ t ] [ "A" R/ [a-z]/i matches? ] unit-test
@ -242,9 +239,11 @@ IN: regexp-tests
[ f ] [ "A" "\\p{Lower}" <regexp> matches? ] unit-test
[ t ] [ "A" R/ \p{Lower}/i matches? ] unit-test
[ t ] [ "abc" <reversed> R/ abc/r matches? ] unit-test
[ t ] [ "abc" <reversed> R/ a[bB][cC]/r matches? ] unit-test
! [ t ] [ "adcbe" R/ a(?r)(bcd)(?-r)e/ matches? ] unit-test ! FIXME
[ t ] [ "abc" R/ abc/r matches? ] unit-test
[ t ] [ "abc" R/ a[bB][cC]/r matches? ] unit-test
[ t ] [ 3 "xabc" R/ abc/r match-index-from >boolean ] unit-test
[ t ] [ 3 "xabc" R/ a[bB][cC]/r match-index-from >boolean ] unit-test
[ t ] [ "s@f" "[a-z.-]@[a-z]" <regexp> matches? ] unit-test
[ f ] [ "a" "[a-z.-]@[a-z]" <regexp> matches? ] unit-test
@ -253,8 +252,6 @@ IN: regexp-tests
[ t ] [ "abc*" "[^\\*]*\\*" <regexp> matches? ] unit-test
[ t ] [ "bca" "[^a]*a" <regexp> matches? ] unit-test
/*
! FIXME
[ ] [
"(0[lL]?|[1-9]\\d{0,9}(\\d{0,9}[lL])?|0[xX]\\p{XDigit}{1,8}(\\p{XDigit}{0,8}[lL])?|0[0-7]{1,11}([0-7]{0,11}[lL])?|([0-9]+\\.[0-9]*|\\.[0-9]+)([eE][+-]?[0-9]+)?[fFdD]?|[0-9]+([eE][+-]?[0-9]+[fFdD]?|([eE][+-]?[0-9]+)?[fFdD]))"
<regexp> drop
@ -278,11 +275,6 @@ IN: regexp-tests
[ "abc" ] [ "abc" "(ab|a)(bc)?" <regexp> first-match >string ] unit-test
[ "b" ] [ "aaaaaaaaaaaaaaaaaaaaaaab" "((a*)*b)*b" <regexp> first-match >string ] unit-test
*/
! [ t ] [ "a:b" ".+:?" <regexp> matches? ] unit-test
! [ 1 ] [ "hello" ".+?" <regexp> match length ] unit-test
[ { "1" "2" "3" "4" } ]
[ "1ABC2DEF3GHI4" R/ [A-Z]+/ re-split [ >string ] map ] unit-test
@ -295,7 +287,7 @@ IN: regexp-tests
[ { "a" "" } ] [ "a=" R/ =/ re-split [ >string ] map ] unit-test
[ { "ABC" "DEF" "GHI" } ]
[ "1ABC2DEF3GHI4" R/ [A-Z]+/ all-matches [ >string ] map ] unit-test
[ "1ABC2DEF3GHI4" R/ [A-Z]+/ all-matching-subseqs ] unit-test
[ 3 ]
[ "1ABC2DEF3GHI4" R/ [A-Z]+/ count-matches ] unit-test
@ -308,127 +300,173 @@ IN: regexp-tests
[ "-- title --" ] [ "== title ==" R/ =/ "-" re-replace ] unit-test
/*
! FIXME
[ f ] [ "ab" "a(?!b)" <regexp> first-match ] unit-test
[ "" ] [ "ab" "a(?!b)" <regexp> first-match >string ] unit-test
[ "a" ] [ "ac" "a(?!b)" <regexp> first-match >string ] unit-test
[ t ] [ "fxxbar" ".{3}(?!foo)bar" <regexp> matches? ] unit-test
[ t ] [ "foobar" ".{3}(?!foo)bar" <regexp> matches? ] unit-test
[ t ] [ "fxxbar" "(?!foo).{3}bar" <regexp> matches? ] unit-test
[ f ] [ "foobar" "(?!foo).{3}bar" <regexp> matches? ] unit-test
[ "a" ] [ "ab" "a(?=b)(?=b)" <regexp> first-match >string ] unit-test
[ "a" ] [ "ba" "a(?<=b)(?<=b)" <regexp> first-match >string ] unit-test
[ "a" ] [ "cab" "a(?=b)(?<=c)" <regexp> first-match >string ] unit-test
[ "a" ] [ "ba" "(?<=b)(?<=b)a" <regexp> first-match >string ] unit-test
[ "a" ] [ "cab" "(?<=c)a(?=b)" <regexp> first-match >string ] unit-test
[ 3 ] [ "foobar" "foo(?=bar)" <regexp> match-head ] unit-test
[ f ] [ "foobxr" "foo(?=bar)" <regexp> match-head ] unit-test
*/
[ 3 ] [ "foobar" "foo(?=bar)" <regexp> first-match length ] unit-test
[ f ] [ "foobxr" "foo(?=bar)" <regexp> first-match ] unit-test
! Bug in parsing word
[ t ] [ "a" R' a' matches? ] unit-test
! Convert to lowercase until E
[ f ] [ "AA" R/ \LAA\E/ matches? ] unit-test
[ t ] [ "aa" R/ \LAA\E/ matches? ] unit-test
! Testing negation
[ f ] [ "a" R/ (?~a)/ matches? ] unit-test
[ t ] [ "aa" R/ (?~a)/ matches? ] unit-test
[ t ] [ "bb" R/ (?~a)/ matches? ] unit-test
[ t ] [ "" R/ (?~a)/ matches? ] unit-test
! Convert to uppercase until E
[ t ] [ "AA" R/ \Uaa\E/ matches? ] unit-test
[ f ] [ "aa" R/ \Uaa\E/ matches? ] unit-test
[ f ] [ "a" R/ (?~a+|b)/ matches? ] unit-test
[ f ] [ "aa" R/ (?~a+|b)/ matches? ] unit-test
[ t ] [ "bb" R/ (?~a+|b)/ matches? ] unit-test
[ f ] [ "b" R/ (?~a+|b)/ matches? ] unit-test
[ t ] [ "" R/ (?~a+|b)/ matches? ] unit-test
! [ "{Lower}" <regexp> ] [ invalid-range? ] must-fail-with
! Intersecting classes
[ t ] [ "ab" R/ ac|\p{Lower}b/ matches? ] unit-test
[ t ] [ "ab" R/ ac|[a-z]b/ matches? ] unit-test
[ t ] [ "ac" R/ ac|\p{Lower}b/ matches? ] unit-test
[ t ] [ "ac" R/ ac|[a-z]b/ matches? ] unit-test
[ t ] [ "ac" R/ [a-zA-Z]c|\p{Lower}b/ matches? ] unit-test
[ t ] [ "ab" R/ [a-zA-Z]c|\p{Lower}b/ matches? ] unit-test
[ t ] [ "πb" R/ [a-zA-Z]c|\p{Lower}b/ matches? ] unit-test
[ f ] [ "πc" R/ [a-zA-Z]c|\p{Lower}b/ matches? ] unit-test
[ f ] [ "Ab" R/ [a-zA-Z]c|\p{Lower}b/ matches? ] unit-test
! [ t ] [ "a" R/ ^a/ matches? ] unit-test
! [ f ] [ "\na" R/ ^a/ matches? ] unit-test
! [ f ] [ "\r\na" R/ ^a/ matches? ] unit-test
! [ f ] [ "\ra" R/ ^a/ matches? ] unit-test
[ t ] [ "aaaa" R/ .*a./ matches? ] unit-test
! [ t ] [ "a" R/ a$/ matches? ] unit-test
! [ f ] [ "a\n" R/ a$/ matches? ] unit-test
! [ f ] [ "a\r" R/ a$/ matches? ] unit-test
! [ f ] [ "a\r\n" R/ a$/ matches? ] unit-test
[ f ] [ "ab" R/ (?~ac|\p{Lower}b)/ matches? ] unit-test
[ f ] [ "ab" R/ (?~ac|[a-z]b)/ matches? ] unit-test
[ f ] [ "ac" R/ (?~ac|\p{Lower}b)/ matches? ] unit-test
[ f ] [ "ac" R/ (?~ac|[a-z]b)/ matches? ] unit-test
[ f ] [ "ac" R/ (?~[a-zA-Z]c|\p{Lower}b)/ matches? ] unit-test
[ f ] [ "ab" R/ (?~[a-zA-Z]c|\p{Lower}b)/ matches? ] unit-test
[ f ] [ "πb" R/ (?~[a-zA-Z]c|\p{Lower}b)/ matches? ] unit-test
[ t ] [ "πc" R/ (?~[a-zA-Z]c|\p{Lower}b)/ matches? ] unit-test
[ t ] [ "Ab" R/ (?~[a-zA-Z]c|\p{Lower}b)/ matches? ] unit-test
! [ t ] [ "a" R/ a$|b$/ matches? ] unit-test
! [ t ] [ "b" R/ a$|b$/ matches? ] unit-test
! [ t ] [ "ab" R/ a$|b$/ matches? ] unit-test
! [ t ] [ "ba" R/ ba$|b$/ matches? ] unit-test
! DFA is compiled when needed, or when literal
[ regexp-initial-word ] [ "foo" <regexp> dfa>> ] unit-test
[ f ] [ R/ foo/ dfa>> \ regexp-initial-word = ] unit-test
! [ t ] [ "a" R/ \Aa/ matches? ] unit-test
! [ f ] [ "\na" R/ \Aaa/ matches? ] unit-test
! [ f ] [ "\r\na" R/ \Aa/ matches? ] unit-test
! [ f ] [ "\ra" R/ \Aa/ matches? ] unit-test
[ t ] [ "a" R/ ^a/ matches? ] unit-test
[ f ] [ "\na" R/ ^a/ matches? ] unit-test
[ f ] [ "\r\na" R/ ^a/ matches? ] unit-test
[ f ] [ "\ra" R/ ^a/ matches? ] unit-test
! [ t ] [ "a" R/ \Aa/m matches? ] unit-test
! [ f ] [ "\na" R/ \Aaa/m matches? ] unit-test
! [ f ] [ "\r\na" R/ \Aa/m matches? ] unit-test
! [ f ] [ "\ra" R/ \Aa/m matches? ] unit-test
[ 1 ] [ "a" R/ ^a/ count-matches ] unit-test
[ 0 ] [ "\na" R/ ^a/ count-matches ] unit-test
[ 0 ] [ "\r\na" R/ ^a/ count-matches ] unit-test
[ 0 ] [ "\ra" R/ ^a/ count-matches ] unit-test
! [ t ] [ "\r\n\n\n\nam" R/ ^am/m matches? ] unit-test
[ t ] [ "a" R/ a$/ matches? ] unit-test
[ f ] [ "a\n" R/ a$/ matches? ] unit-test
[ f ] [ "a\r" R/ a$/ matches? ] unit-test
[ f ] [ "a\r\n" R/ a$/ matches? ] unit-test
! [ t ] [ "a" R/ \Aa\z/m matches? ] unit-test
! [ f ] [ "a\n" R/ \Aa\z/m matches? ] unit-test
[ 1 ] [ "a" R/ a$/ count-matches ] unit-test
[ 0 ] [ "a\n" R/ a$/ count-matches ] unit-test
[ 0 ] [ "a\r" R/ a$/ count-matches ] unit-test
[ 0 ] [ "a\r\n" R/ a$/ count-matches ] unit-test
! [ t ] [ "a\r\n" R/ \Aa\Z/m matches? ] unit-test
! [ t ] [ "a\n" R/ \Aa\Z/m matches? ] unit-test
[ t ] [ "a" R/ a$|b$/ matches? ] unit-test
[ t ] [ "b" R/ a$|b$/ matches? ] unit-test
[ f ] [ "ab" R/ a$|b$/ matches? ] unit-test
[ t ] [ "ba" R/ ba$|b$/ matches? ] unit-test
! [ t ] [ "a" R/ \Aa\Z/m matches? ] unit-test
! [ f ] [ "\na" R/ \Aaa\Z/m matches? ] unit-test
! [ f ] [ "\r\na" R/ \Aa\Z/m matches? ] unit-test
! [ f ] [ "\ra" R/ \Aa\Z/m matches? ] unit-test
[ t ] [ "a" R/ \Aa/ matches? ] unit-test
[ f ] [ "\na" R/ \Aaa/ matches? ] unit-test
[ f ] [ "\r\na" R/ \Aa/ matches? ] unit-test
[ f ] [ "\ra" R/ \Aa/ matches? ] unit-test
! [ t ] [ "a" R/ ^a/m matches? ] unit-test
! [ t ] [ "\na" R/ ^a/m matches? ] unit-test
! [ t ] [ "\r\na" R/ ^a/m matches? ] unit-test
! [ t ] [ "\ra" R/ ^a/m matches? ] unit-test
[ t ] [ "a" R/ \Aa/m matches? ] unit-test
[ f ] [ "\na" R/ \Aaa/m matches? ] unit-test
[ f ] [ "\r\na" R/ \Aa/m matches? ] unit-test
[ f ] [ "\ra" R/ \Aa/m matches? ] unit-test
[ 0 ] [ "\ra" R/ \Aa/m count-matches ] unit-test
! [ t ] [ "a" "a$" R/ a$/m matches? ] unit-test
! [ t ] [ "a\n" "a$" R/ a$/m matches? ] unit-test
! [ t ] [ "a\r" "a$" R/ a$/m matches? ] unit-test
! [ t ] [ "a\r\n" "a$" R/ a$/m matches? ] unit-test
[ f ] [ "\r\n\n\n\nam" R/ ^am/m matches? ] unit-test
[ 1 ] [ "\r\n\n\n\nam" R/ ^am/m count-matches ] unit-test
! [ f ] [ "foobxr" "foo\\z" <regexp> match-head ] unit-test
! [ 3 ] [ "foo" "foo\\z" <regexp> match-head ] unit-test
[ t ] [ "a" R/ \Aa\z/m matches? ] unit-test
[ f ] [ "a\n" R/ \Aa\z/m matches? ] unit-test
! [ t ] [ "foo" "\\bfoo\\b" <regexp> matches? ] unit-test
! [ t ] [ "afoob" "\\Bfoo\\B" <regexp> matches? ] unit-test
! [ t ] [ "afoob" "\\bfoo\\b" <regexp> matches? ] unit-test
! [ f ] [ "foo" "\\Bfoo\\B" <regexp> matches? ] unit-test
[ f ] [ "a\r\n" R/ \Aa\Z/m matches? ] unit-test
[ f ] [ "a\n" R/ \Aa\Z/m matches? ] unit-test
[ 1 ] [ "a\r\n" R/ \Aa\Z/m count-matches ] unit-test
[ 1 ] [ "a\n" R/ \Aa\Z/m count-matches ] unit-test
! [ 3 ] [ "foo bar" "foo\\b" <regexp> match-head ] unit-test
! [ f ] [ "fooxbar" "foo\\b" <regexp> matches? ] unit-test
! [ t ] [ "foo" "foo\\b" <regexp> matches? ] unit-test
! [ t ] [ "foo bar" "foo\\b bar" <regexp> matches? ] unit-test
! [ f ] [ "fooxbar" "foo\\bxbar" <regexp> matches? ] unit-test
! [ f ] [ "foo" "foo\\bbar" <regexp> matches? ] unit-test
[ t ] [ "a" R/ \Aa\Z/m matches? ] unit-test
[ f ] [ "\na" R/ \Aaa\Z/m matches? ] unit-test
[ f ] [ "\r\na" R/ \Aa\Z/m matches? ] unit-test
[ f ] [ "\ra" R/ \Aa\Z/m matches? ] unit-test
! [ f ] [ "foo bar" "foo\\B" <regexp> matches? ] unit-test
! [ 3 ] [ "fooxbar" "foo\\B" <regexp> match-head ] unit-test
! [ t ] [ "foo" "foo\\B" <regexp> matches? ] unit-test
! [ f ] [ "foo bar" "foo\\B bar" <regexp> matches? ] unit-test
! [ t ] [ "fooxbar" "foo\\Bxbar" <regexp> matches? ] unit-test
! [ f ] [ "foo" "foo\\Bbar" <regexp> matches? ] unit-test
[ 1 ] [ "a" R/ \Aa\Z/m count-matches ] unit-test
[ 0 ] [ "\na" R/ \Aaa\Z/m count-matches ] unit-test
[ 0 ] [ "\r\na" R/ \Aa\Z/m count-matches ] unit-test
[ 0 ] [ "\ra" R/ \Aa\Z/m count-matches ] unit-test
! [ 1 ] [ "aaacb" "a+?" <regexp> match-head ] unit-test
! [ 1 ] [ "aaacb" "aa??" <regexp> match-head ] unit-test
! [ f ] [ "aaaab" "a++ab" <regexp> matches? ] unit-test
! [ t ] [ "aaacb" "a++cb" <regexp> matches? ] unit-test
! [ 3 ] [ "aacb" "aa?c" <regexp> match-head ] unit-test
! [ 3 ] [ "aacb" "aa??c" <regexp> match-head ] unit-test
[ t ] [ "a" R/ ^a/m matches? ] unit-test
[ f ] [ "\na" R/ ^a/m matches? ] unit-test
[ 1 ] [ "\na" R/ ^a/m count-matches ] unit-test
[ 1 ] [ "\r\na" R/ ^a/m count-matches ] unit-test
[ 1 ] [ "\ra" R/ ^a/m count-matches ] unit-test
! "ab" "a(?=b*)" <regexp> match
! "abbbbbc" "a(?=b*c)" <regexp> match
! "ab" "a(?=b*)" <regexp> match
[ t ] [ "a" R/ a$/m matches? ] unit-test
[ f ] [ "a\n" R/ a$/m matches? ] unit-test
[ 1 ] [ "a\n" R/ a$/m count-matches ] unit-test
[ 1 ] [ "a\r" R/ a$/m count-matches ] unit-test
[ 1 ] [ "a\r\n" R/ a$/m count-matches ] unit-test
! "baz" "(az)(?<=b)" <regexp> first-match
! "cbaz" "a(?<=b*)" <regexp> first-match
! "baz" "a(?<=b)" <regexp> first-match
[ f ] [ "foobxr" "foo\\z" <regexp> first-match ] unit-test
[ 3 ] [ "foo" "foo\\z" <regexp> first-match length ] unit-test
! "baz" "a(?<!b)" <regexp> first-match
! "caz" "a(?<!b)" <regexp> first-match
[ t ] [ "a foo b" R/ foo/ re-contains? ] unit-test
[ f ] [ "a bar b" R/ foo/ re-contains? ] unit-test
[ t ] [ "foo" R/ foo/ re-contains? ] unit-test
! "abcdefg" "a(?=bcdefg)bcd" <regexp> first-match
! "abcdefg" "a(?#bcdefg)bcd" <regexp> first-match
! "abcdefg" "a(?:bcdefg)" <regexp> first-match
[ { "foo" "fxx" "fab" } ] [ "fab fxx foo" R/ f../r all-matching-subseqs ] unit-test
! "caba" "a(?<=b)" <regexp> first-match
[ t ] [ "foo" "\\bfoo\\b" <regexp> re-contains? ] unit-test
[ t ] [ "afoob" "\\Bfoo\\B" <regexp> re-contains? ] unit-test
[ f ] [ "afoob" "\\bfoo\\b" <regexp> re-contains? ] unit-test
[ f ] [ "foo" "\\Bfoo\\B" <regexp> re-contains? ] unit-test
! capture group 1: "aaaa" 2: ""
! "aaaa" "(a*)(a*)" <regexp> match*
! "aaaa" "(a*)(a+)" <regexp> match*
[ 3 ] [ "foo bar" "foo\\b" <regexp> first-match length ] unit-test
[ f ] [ "fooxbar" "foo\\b" <regexp> re-contains? ] unit-test
[ t ] [ "foo" "foo\\b" <regexp> re-contains? ] unit-test
[ t ] [ "foo bar" "foo\\b bar" <regexp> matches? ] unit-test
[ f ] [ "fooxbar" "foo\\bxbar" <regexp> matches? ] unit-test
[ f ] [ "foo" "foo\\bbar" <regexp> matches? ] unit-test
[ f ] [ "foo bar" "foo\\B" <regexp> re-contains? ] unit-test
[ 3 ] [ "fooxbar" "foo\\B" <regexp> first-match length ] unit-test
[ f ] [ "foo" "foo\\B" <regexp> re-contains? ] unit-test
[ f ] [ "foo bar" "foo\\B bar" <regexp> matches? ] unit-test
[ t ] [ "fooxbar" "foo\\Bxbar" <regexp> matches? ] unit-test
[ f ] [ "foo" "foo\\Bbar" <regexp> matches? ] unit-test
[ t ] [ "ab" "a(?=b*)" <regexp> re-contains? ] unit-test
[ t ] [ "abbbbbc" "a(?=b*c)" <regexp> re-contains? ] unit-test
[ f ] [ "abbbbb" "a(?=b*c)" <regexp> re-contains? ] unit-test
[ t ] [ "ab" "a(?=b*)" <regexp> re-contains? ] unit-test
[ "az" ] [ "baz" "(?<=b)(az)" <regexp> first-match >string ] unit-test
[ f ] [ "chaz" "(?<=b)(az)" <regexp> re-contains? ] unit-test
[ "a" ] [ "cbaz" "(?<=b*)a" <regexp> first-match >string ] unit-test
[ f ] [ "baz" "a(?<=b)" <regexp> re-contains? ] unit-test
[ f ] [ "baz" "(?<!b)a" <regexp> re-contains? ] unit-test
[ t ] [ "caz" "(?<!b)a" <regexp> re-contains? ] unit-test
[ "abcd" ] [ "abcdefg" "a(?=bcdefg)bcd" <regexp> first-match >string ] unit-test
[ t ] [ "abcdefg" "a(?#bcdefg)bcd" <regexp> re-contains? ] unit-test
[ t ] [ "abcdefg" "a(?:bcdefg)" <regexp> matches? ] unit-test
[ 3 ] [ "caba" "(?<=b)a" <regexp> first-match from>> ] unit-test

View File

@ -1,87 +1,182 @@
! Copyright (C) 2008 Doug Coleman.
! Copyright (C) 2008, 2009 Doug Coleman, Daniel Ehrenberg.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors combinators kernel math sequences strings sets
assocs prettyprint.backend prettyprint.custom make lexer
namespaces parser arrays fry regexp.backend regexp.utils
regexp.parser regexp.nfa regexp.dfa regexp.traversal
regexp.transition-tables splitting sorting ;
USING: accessors combinators kernel kernel.private math sequences
sequences.private strings sets assocs prettyprint.backend
prettyprint.custom make lexer namespaces parser arrays fry locals
regexp.parser splitting sorting regexp.ast regexp.negation
regexp.compiler words call call.private math.ranges ;
IN: regexp
: default-regexp ( string -- regexp )
regexp new
swap >>raw
<transition-table> >>nfa-table
<transition-table> >>dfa-table
<transition-table> >>minimized-table
H{ } clone >>nfa-traversal-flags
H{ } clone >>dfa-traversal-flags
H{ } clone >>options
H{ } clone >>matchers
reset-regexp ;
TUPLE: regexp
{ raw read-only }
{ parse-tree read-only }
{ options read-only }
dfa next-match ;
: construct-regexp ( regexp -- regexp' )
{
[ parse-regexp ]
[ construct-nfa ]
[ construct-dfa ]
[ ]
} cleave ;
: (match) ( string regexp -- dfa-traverser )
<dfa-traverser> do-match ; inline
: match ( string regexp -- slice/f )
(match) return-match ;
: match* ( string regexp -- slice/f captured-groups )
(match) [ return-match ] [ captured-groups>> ] bi ;
: matches? ( string regexp -- ? )
dupd match
[ [ length ] bi@ = ] [ drop f ] if* ;
: match-head ( string regexp -- end/f ) match [ length ] [ f ] if* ;
: match-at ( string m regexp -- n/f finished? )
[
2dup swap length > [ 2drop f f ] [ tail-slice t ] if
] dip swap [ match-head f ] [ 2drop f t ] if ;
: match-range ( string m regexp -- a/f b/f )
3dup match-at over [
drop nip rot drop dupd +
] [
[ 3drop drop f f ] [ drop [ 1+ ] dip match-range ] if
] if ;
: first-match ( string regexp -- slice/f )
dupd 0 swap match-range rot over [ <slice> ] [ 3drop f ] if ;
: re-cut ( string regexp -- end/f start )
dupd first-match
[ split1-slice swap ] [ "" like f swap ] if* ;
: (re-split) ( string regexp -- )
over [ [ re-cut , ] keep (re-split) ] [ 2drop ] if ;
: re-split ( string regexp -- seq )
[ (re-split) ] { } make ;
: re-replace ( string regexp replacement -- result )
[ re-split ] dip join ;
: next-match ( string regexp -- end/f match/f )
dupd first-match dup
[ [ split1-slice nip ] keep ] [ 2drop f f ] if ;
: all-matches ( string regexp -- seq )
[ dup ] swap '[ _ next-match ] produce nip harvest ;
: count-matches ( string regexp -- n )
all-matches length ;
TUPLE: reverse-regexp < regexp ;
<PRIVATE
M: lookahead question>quot ! Returns ( index string -- ? )
term>> ast>dfa dfa>shortest-word '[ f _ execute ] ;
: <reversed-option> ( ast -- reversed )
"r" string>options <with-options> ;
M: lookbehind question>quot ! Returns ( index string -- ? )
term>> <reversed-option>
ast>dfa dfa>reverse-shortest-word
'[ [ 1- ] dip f _ execute ] ;
: check-string ( string -- string )
! Make this configurable
dup string? [ "String required" throw ] unless ;
: match-index-from ( i string regexp -- index/f )
! This word is unsafe. It assumes that i is a fixnum
! and that string is a string.
dup dfa>> execute-unsafe( index string regexp -- i/f ) ;
GENERIC: end/start ( string regexp -- end start )
M: regexp end/start drop length 0 ;
M: reverse-regexp end/start drop length 1- -1 swap ;
PRIVATE>
: matches? ( string regexp -- ? )
[ check-string ] dip
[ end/start ] 2keep
match-index-from
[ = ] [ drop f ] if* ;
<PRIVATE
:: (next-match) ( i string regexp quot: ( i string regexp -- j ) reverse? -- i start end ? )
i string regexp quot call dup [| j |
j i j
reverse? [ swap [ 1+ ] bi@ ] when
string
] [ drop f f f f ] if ; inline
: search-range ( i string reverse? -- seq )
[ drop dup 1+ -1 ] [ length 1 ] if range boa ; inline
:: next-match ( i string regexp quot: ( i string regexp -- j ) reverse? -- i start end ? )
f f f f
i string reverse? search-range
[ [ 2drop 2drop ] dip string regexp quot reverse? (next-match) dup ] find 2drop ; inline
: do-next-match ( i string regexp -- i start end ? )
dup next-match>>
execute-unsafe( i string regexp -- i start end ? ) ; inline
:: (each-match) ( i string regexp quot: ( start end string -- ) -- )
i string regexp do-next-match [| i' start end |
start end string quot call
i' string regexp quot (each-match)
] [ 3drop ] if ; inline recursive
: prepare-match-iterator ( string regexp -- i string regexp )
[ check-string ] dip [ end/start nip ] 2keep ; inline
PRIVATE>
: each-match ( string regexp quot: ( start end string -- ) -- )
[ prepare-match-iterator ] dip (each-match) ; inline
: map-matches ( string regexp quot: ( start end string -- obj ) -- seq )
accumulator [ each-match ] dip >array ; inline
: all-matching-slices ( string regexp -- seq )
[ slice boa ] map-matches ;
: all-matching-subseqs ( string regexp -- seq )
[ subseq ] map-matches ;
: count-matches ( string regexp -- n )
[ 0 ] 2dip [ 3drop 1+ ] each-match ;
<PRIVATE
:: (re-split) ( string regexp quot -- new-slices )
0 string regexp [| end start end' string |
end' ! leave it on the stack for the next iteration
end start string quot call
] map-matches
! Final chunk
swap string length string quot call suffix ; inline
PRIVATE>
: first-match ( string regexp -- slice/f )
[ prepare-match-iterator do-next-match ] [ drop ] 2bi
'[ _ slice boa nip ] [ 3drop f ] if ;
: re-contains? ( string regexp -- ? )
prepare-match-iterator do-next-match [ 3drop ] dip >boolean ;
: re-split ( string regexp -- seq )
[ slice boa ] (re-split) ;
: re-replace ( string regexp replacement -- result )
[ [ subseq ] (re-split) ] dip join ;
<PRIVATE
: get-ast ( regexp -- ast )
[ parse-tree>> ] [ options>> ] bi <with-options> ;
GENERIC: compile-regexp ( regex -- regexp )
: regexp-initial-word ( i string regexp -- i/f )
compile-regexp match-index-from ;
: do-compile-regexp ( regexp -- regexp )
dup '[
dup \ regexp-initial-word =
[ drop _ get-ast ast>dfa dfa>word ] when
] change-dfa ;
M: regexp compile-regexp ( regexp -- regexp )
do-compile-regexp ;
M: reverse-regexp compile-regexp ( regexp -- regexp )
t backwards? [ do-compile-regexp ] with-variable ;
DEFER: compile-next-match
: next-initial-word ( i string regexp -- i start end string )
compile-next-match do-next-match ;
: compile-next-match ( regexp -- regexp )
dup '[
dup \ next-initial-word = [
drop _ [ compile-regexp dfa>> def>> ] [ reverse-regexp? ] bi
'[ { array-capacity string regexp } declare _ _ next-match ]
(( i string regexp -- i start end string )) simple-define-temp
] when
] change-next-match ;
PRIVATE>
: new-regexp ( string ast options class -- regexp )
[ \ regexp-initial-word \ next-initial-word ] dip boa ; inline
: make-regexp ( string ast -- regexp )
f f <options> regexp new-regexp ;
: <optioned-regexp> ( string options -- regexp )
[ dup parse-regexp ] [ string>options ] bi*
dup on>> reversed-regexp swap member?
[ reverse-regexp new-regexp ]
[ regexp new-regexp ] if ;
: <regexp> ( string -- regexp ) "" <optioned-regexp> ;
<PRIVATE
! The following two should do some caching
: find-regexp-syntax ( string -- prefix suffix )
{
{ "R/ " "/" }
@ -97,28 +192,19 @@ IN: regexp
{ "R| " "|" }
} swap [ subseq? not nip ] curry assoc-find drop ;
: string>options ( string -- options )
[ ch>option dup ] H{ } map>assoc ;
: take-until ( end lexer -- string )
dup skip-blank [
[ index-from ] 2keep
[ swapd subseq ]
[ 2drop 1+ ] 3bi
] change-lexer-column ;
: options>string ( options -- string )
keys [ option>ch ] map natural-sort >string ;
PRIVATE>
: <optioned-regexp> ( string option-string -- regexp )
[ default-regexp ] [ string>options ] bi* >>options
construct-regexp ;
: <regexp> ( string -- regexp ) "" <optioned-regexp> ;
<PRIVATE
: parse-noblank-token ( lexer -- str/f )
dup still-parsing-line? [ (parse-token) ] [ drop f ] if ;
: parsing-regexp ( accum end -- accum )
lexer get dup skip-blank
[ [ index-from dup 1+ swap ] 2keep swapd subseq swap ] change-lexer-column
lexer get dup still-parsing-line?
[ (parse-token) ] [ drop f ] if
<optioned-regexp> parsed ;
lexer get [ take-until ] [ parse-noblank-token ] bi
<optioned-regexp> compile-next-match parsed ;
PRIVATE>
@ -141,3 +227,4 @@ M: regexp pprint*
[ options>> options>string % ] bi
] "" make
] keep present-text ;

View File

@ -1,32 +1,9 @@
! Copyright (C) 2008 Doug Coleman.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors arrays assocs fry hashtables kernel sequences
vectors regexp.utils ;
vectors locals regexp.classes ;
IN: regexp.transition-tables
TUPLE: transition from to obj ;
TUPLE: literal-transition < transition ;
TUPLE: class-transition < transition ;
TUPLE: default-transition < transition ;
TUPLE: literal obj ;
TUPLE: class obj ;
TUPLE: default ;
: make-transition ( from to obj class -- obj )
new
swap >>obj
swap >>to
swap >>from ;
: <literal-transition> ( from to obj -- transition )
literal-transition make-transition ;
: <class-transition> ( from to obj -- transition )
class-transition make-transition ;
: <default-transition> ( from to -- transition )
t default-transition make-transition ;
TUPLE: transition-table transitions start-state final-states ;
: <transition-table> ( -- transition-table )
@ -35,14 +12,50 @@ TUPLE: transition-table transitions start-state final-states ;
H{ } clone >>final-states ;
: maybe-initialize-key ( key hashtable -- )
! Why do we have to do this?
2dup key? [ 2drop ] [ [ H{ } clone ] 2dip set-at ] if ;
: set-transition ( transition hash -- )
#! set the state as a key
2dup [ to>> ] dip maybe-initialize-key
[ [ to>> ] [ obj>> ] [ from>> ] tri ] dip
2dup at* [ 2nip insert-at ]
[ drop [ H{ } clone [ insert-at ] keep ] 2dip set-at ] if ;
:: (set-transition) ( from to obj hash -- )
to condition? [ to hash maybe-initialize-key ] unless
from hash at
[ [ to obj ] dip set-at ]
[ to obj associate from hash set-at ] if* ;
: add-transition ( transition transition-table -- )
transitions>> set-transition ;
: set-transition ( from to obj transition-table -- )
transitions>> (set-transition) ;
:: (add-transition) ( from to obj hash -- )
to hash maybe-initialize-key
from hash at
[ [ to obj ] dip push-at ]
[ to 1vector obj associate from hash set-at ] if* ;
: add-transition ( from to obj transition-table -- )
transitions>> (add-transition) ;
: map-set ( assoc quot -- new-assoc )
'[ drop @ dup ] assoc-map ; inline
: number-transitions ( transitions numbering -- new-transitions )
dup '[
[ _ at ]
[ [ _ condition-at ] assoc-map ] bi*
] assoc-map ;
: transitions-at ( transition-table assoc -- transition-table )
[ clone ] dip
[ '[ _ condition-at ] change-start-state ]
[ '[ [ _ at ] map-set ] change-final-states ]
[ '[ _ number-transitions ] change-transitions ] tri ;
: expand-one-or ( or-class transition -- alist )
[ seq>> ] dip '[ _ 2array ] map ;
: expand-or ( state-transitions -- new-transitions )
>alist [
first2 over or-class?
[ expand-one-or ] [ 2array 1array ] if
] map concat >hashtable ;
: expand-ors ( transition-table -- transition-table )
[ [ expand-or ] assoc-map ] change-transitions ;

View File

@ -1,195 +0,0 @@
! Copyright (C) 2008 Doug Coleman.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors assocs combinators kernel math
quotations sequences regexp.parser regexp.classes fry arrays
combinators.short-circuit regexp.utils prettyprint regexp.nfa ;
IN: regexp.traversal
TUPLE: dfa-traverser
dfa-table
traversal-flags
traverse-forward
lookahead-counters
lookbehind-counters
capture-counters
captured-groups
capture-group-index
last-state current-state
text
match-failed?
start-index current-index
matches ;
: <dfa-traverser> ( text regexp -- match )
[ dfa-table>> ] [ dfa-traversal-flags>> ] bi
dfa-traverser new
swap >>traversal-flags
swap [ start-state>> >>current-state ] [ >>dfa-table ] bi
swap >>text
t >>traverse-forward
0 >>start-index
0 >>current-index
0 >>capture-group-index
V{ } clone >>matches
V{ } clone >>capture-counters
V{ } clone >>lookbehind-counters
V{ } clone >>lookahead-counters
H{ } clone >>captured-groups ;
: final-state? ( dfa-traverser -- ? )
[ current-state>> ]
[ dfa-table>> final-states>> ] bi key? ;
: beginning-of-text? ( dfa-traverser -- ? )
current-index>> 0 <= ; inline
: end-of-text? ( dfa-traverser -- ? )
[ current-index>> ] [ text>> length ] bi >= ; inline
: text-finished? ( dfa-traverser -- ? )
{
[ current-state>> empty? ]
[ end-of-text? ]
[ match-failed?>> ]
} 1|| ;
: save-final-state ( dfa-straverser -- )
[ current-index>> ] [ matches>> ] bi push ;
: match-done? ( dfa-traverser -- ? )
dup final-state? [
dup save-final-state
] when text-finished? ;
: previous-text-character ( dfa-traverser -- ch )
[ text>> ] [ current-index>> 1- ] bi nth ;
: current-text-character ( dfa-traverser -- ch )
[ text>> ] [ current-index>> ] bi nth ;
: next-text-character ( dfa-traverser -- ch )
[ text>> ] [ current-index>> 1+ ] bi nth ;
GENERIC: flag-action ( dfa-traverser flag -- )
M: beginning-of-input flag-action ( dfa-traverser flag -- )
drop
dup beginning-of-text? [ t >>match-failed? ] unless drop ;
M: end-of-input flag-action ( dfa-traverser flag -- )
drop
dup end-of-text? [ t >>match-failed? ] unless drop ;
M: beginning-of-line flag-action ( dfa-traverser flag -- )
drop
dup {
[ beginning-of-text? ]
[ previous-text-character terminator-class class-member? ]
} 1|| [ t >>match-failed? ] unless drop ;
M: end-of-line flag-action ( dfa-traverser flag -- )
drop
dup {
[ end-of-text? ]
[ next-text-character terminator-class class-member? ]
} 1|| [ t >>match-failed? ] unless drop ;
M: word-boundary flag-action ( dfa-traverser flag -- )
drop
dup {
[ end-of-text? ]
[ current-text-character terminator-class class-member? ]
} 1|| [ t >>match-failed? ] unless drop ;
M: lookahead-on flag-action ( dfa-traverser flag -- )
drop
lookahead-counters>> 0 swap push ;
M: lookahead-off flag-action ( dfa-traverser flag -- )
drop
dup lookahead-counters>>
[ drop ] [ pop '[ _ - ] change-current-index drop ] if-empty ;
M: lookbehind-on flag-action ( dfa-traverser flag -- )
drop
f >>traverse-forward
[ 2 - ] change-current-index
lookbehind-counters>> 0 swap push ;
M: lookbehind-off flag-action ( dfa-traverser flag -- )
drop
t >>traverse-forward
dup lookbehind-counters>>
[ drop ] [ pop '[ _ + 2 + ] change-current-index drop ] if-empty ;
M: capture-group-on flag-action ( dfa-traverser flag -- )
drop
[ current-index>> 0 2array ]
[ capture-counters>> ] bi push ;
M: capture-group-off flag-action ( dfa-traverser flag -- )
drop
dup capture-counters>> empty? [
drop
] [
{
[ capture-counters>> pop first2 dupd + ]
[ text>> <slice> ]
[ [ 1+ ] change-capture-group-index capture-group-index>> ]
[ captured-groups>> set-at ]
} cleave
] if ;
: process-flags ( dfa-traverser -- )
[ [ 1+ ] map ] change-lookahead-counters
[ [ 1+ ] map ] change-lookbehind-counters
[ [ first2 1+ 2array ] map ] change-capture-counters
! dup current-state>> .
dup [ current-state>> ] [ traversal-flags>> ] bi
at [ flag-action ] with each ;
: increment-state ( dfa-traverser state -- dfa-traverser )
[
dup traverse-forward>>
[ [ 1+ ] change-current-index ]
[ [ 1- ] change-current-index ] if
dup current-state>> >>last-state
] [ first ] bi* >>current-state ;
: match-literal ( transition from-state table -- to-state/f )
transitions>> at at ;
: match-class ( transition from-state table -- to-state/f )
transitions>> at* [
[ drop class-member? ] assoc-with assoc-find [ nip ] [ drop ] if
] [ drop ] if ;
: match-default ( transition from-state table -- to-state/f )
[ drop ] 2dip transitions>> at t swap at ;
: match-transition ( obj from-state dfa -- to-state/f )
{ [ match-literal ] [ match-class ] [ match-default ] } 3|| ;
: setup-match ( match -- obj state dfa-table )
[ [ current-index>> ] [ text>> ] bi nth ]
[ current-state>> ]
[ dfa-table>> ] tri ;
: do-match ( dfa-traverser -- dfa-traverser )
dup process-flags
dup match-done? [
dup setup-match match-transition
[ increment-state do-match ] when*
] unless ;
: return-match ( dfa-traverser -- slice/f )
dup matches>>
[ drop f ]
[
[ [ text>> ] [ start-index>> ] bi ]
[ peek ] bi* rot <slice>
] if-empty ;

View File

@ -1,4 +0,0 @@
USING: regexp.utils tools.test ;
IN: regexp.utils.tests
[ [ ] [ ] while-changes ] must-infer

View File

@ -1,64 +0,0 @@
! Copyright (C) 2008 Doug Coleman.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors arrays assocs io kernel math math.order
namespaces regexp.backend sequences unicode.categories
math.ranges fry combinators.short-circuit vectors ;
IN: regexp.utils
: (while-changes) ( obj quot: ( obj -- obj' ) pred: ( obj -- <=> ) pred-ret -- obj )
[ [ dup slip ] dip pick over call ] dip dupd =
[ 3drop ] [ (while-changes) ] if ; inline recursive
: while-changes ( obj quot pred -- obj' )
pick over call (while-changes) ; inline
: assoc-with ( param assoc quot -- assoc curry )
swapd [ [ -rot ] dip call ] 2curry ; inline
: insert-at ( value key hash -- )
2dup at* [
2nip push
] [
drop
[ dup vector? [ 1vector ] unless ] 2dip set-at
] if ;
: ?insert-at ( value key hash/f -- hash )
[ H{ } clone ] unless* [ insert-at ] keep ;
ERROR: bad-octal number ;
ERROR: bad-hex number ;
: check-octal ( octal -- octal ) dup 255 > [ bad-octal ] when ;
: check-hex ( hex -- hex ) dup number? [ bad-hex ] unless ;
: ascii? ( n -- ? ) 0 HEX: 7f between? ;
: octal-digit? ( n -- ? ) CHAR: 0 CHAR: 7 between? ;
: decimal-digit? ( n -- ? ) CHAR: 0 CHAR: 9 between? ;
: hex-digit? ( n -- ? )
[
[ decimal-digit? ]
[ CHAR: a CHAR: f between? ]
[ CHAR: A CHAR: F between? ]
] 1|| ;
: control-char? ( n -- ? )
[
[ 0 HEX: 1f between? ]
[ HEX: 7f = ]
] 1|| ;
: punct? ( n -- ? )
"!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~" member? ;
: c-identifier-char? ( ch -- ? )
[ [ alpha? ] [ CHAR: _ = ] ] 1|| ;
: java-blank? ( n -- ? )
{
CHAR: \s CHAR: \t CHAR: \n
HEX: b HEX: 7 CHAR: \r
} member? ;
: java-printable? ( n -- ? )
[ [ alpha? ] [ punct? ] ] 1|| ;

1
basis/see/authors.txt Normal file
View File

@ -0,0 +1 @@
Slava Pestov

55
basis/see/see-docs.factor Normal file
View File

@ -0,0 +1,55 @@
IN: see
USING: help.markup help.syntax strings prettyprint.private
definitions generic words classes ;
HELP: synopsis
{ $values { "defspec" "a definition specifier" } { "str" string } }
{ $contract "Prettyprints the prologue of a definition." } ;
HELP: synopsis*
{ $values { "defspec" "a definition specifier" } }
{ $contract "Adds sections to the current block corresponding to a the prologue of a definition, in source code-like form." }
{ $notes "This word should only be called from inside the " { $link with-pprint } " combinator. Client code should call " { $link synopsis } " instead." } ;
HELP: see
{ $values { "defspec" "a definition specifier" } }
{ $contract "Prettyprints a definition." } ;
HELP: see-methods
{ $values { "word" "a " { $link generic } " or a " { $link class } } }
{ $contract "Prettyprints the methods defined on a generic word or class." } ;
HELP: definer
{ $values { "defspec" "a definition specifier" } { "start" word } { "end" "a word or " { $link f } } }
{ $contract "Outputs the parsing words which delimit the definition." }
{ $examples
{ $example "USING: definitions prettyprint ;"
"IN: scratchpad"
": foo ; \\ foo definer . ."
";\nPOSTPONE: :"
}
{ $example "USING: definitions prettyprint ;"
"IN: scratchpad"
"SYMBOL: foo \\ foo definer . ."
"f\nPOSTPONE: SYMBOL:"
}
}
{ $notes "This word is used in the implementation of " { $link see } "." } ;
HELP: definition
{ $values { "defspec" "a definition specifier" } { "seq" "a sequence" } }
{ $contract "Outputs the body of a definition." }
{ $examples
{ $example "USING: definitions math prettyprint ;" "\\ sq definition ." "[ dup * ]" }
}
{ $notes "This word is used in the implementation of " { $link see } "." } ;
ARTICLE: "see" "Printing definitions"
"The " { $vocab-link "see" } " vocabulary implements support for printing out " { $link "definitions" } " in the image."
$nl
"Printing a definition:"
{ $subsection see }
"Printing the methods defined on a generic word or class (see " { $link "objects" } "):"
{ $subsection see-methods } ;
ABOUT: "see"

227
basis/see/see.factor Normal file
View File

@ -0,0 +1,227 @@
! Copyright (C) 2009 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors arrays assocs classes classes.builtin
classes.intersection classes.mixin classes.predicate
classes.singleton classes.tuple classes.union combinators
definitions effects generic generic.standard io io.pathnames
io.streams.string io.styles kernel make namespaces prettyprint
prettyprint.backend prettyprint.config prettyprint.custom
prettyprint.sections sequences sets sorting strings summary
words words.symbol ;
IN: see
GENERIC: see* ( defspec -- )
: see ( defspec -- ) see* nl ;
: synopsis ( defspec -- str )
[
0 margin set
1 line-limit set
[ synopsis* ] with-in
] with-string-writer ;
: definer. ( defspec -- )
definer drop pprint-word ;
: comment. ( text -- )
H{ { font-style italic } } styled-text ;
: stack-effect. ( word -- )
[ [ parsing-word? ] [ symbol? ] bi or not ] [ stack-effect ] bi and
[ effect>string comment. ] when* ;
<PRIVATE
: seeing-word ( word -- )
vocabulary>> pprinter-in set ;
: word-synopsis ( word -- )
{
[ seeing-word ]
[ definer. ]
[ pprint-word ]
[ stack-effect. ]
} cleave ;
M: word synopsis* word-synopsis ;
M: simple-generic synopsis* word-synopsis ;
M: standard-generic synopsis*
{
[ definer. ]
[ seeing-word ]
[ pprint-word ]
[ dispatch# pprint* ]
[ stack-effect. ]
} cleave ;
M: hook-generic synopsis*
{
[ definer. ]
[ seeing-word ]
[ pprint-word ]
[ "combination" word-prop var>> pprint* ]
[ stack-effect. ]
} cleave ;
M: method-spec synopsis*
first2 method synopsis* ;
M: method-body synopsis*
[ definer. ]
[ "method-class" word-prop pprint-word ]
[ "method-generic" word-prop pprint-word ] tri ;
M: mixin-instance synopsis*
[ definer. ]
[ class>> pprint-word ]
[ mixin>> pprint-word ] tri ;
M: pathname synopsis* pprint* ;
M: word summary synopsis ;
GENERIC: declarations. ( obj -- )
M: object declarations. drop ;
: declaration. ( word prop -- )
[ nip ] [ name>> word-prop ] 2bi
[ pprint-word ] [ drop ] if ;
M: word declarations.
{
POSTPONE: parsing
POSTPONE: delimiter
POSTPONE: inline
POSTPONE: recursive
POSTPONE: foldable
POSTPONE: flushable
} [ declaration. ] with each ;
: pprint-; ( -- ) \ ; pprint-word ;
M: object see*
[
12 nesting-limit set
100 length-limit set
<colon dup synopsis*
<block dup definition pprint-elements block>
dup definer nip [ pprint-word ] when* declarations.
block>
] with-use ;
M: method-spec see*
first2 method see* ;
GENERIC: see-class* ( word -- )
M: union-class see-class*
<colon \ UNION: pprint-word
dup pprint-word
members pprint-elements pprint-; block> ;
M: intersection-class see-class*
<colon \ INTERSECTION: pprint-word
dup pprint-word
participants pprint-elements pprint-; block> ;
M: mixin-class see-class*
<block \ MIXIN: pprint-word
dup pprint-word <block
dup members [
hard line-break
\ INSTANCE: pprint-word pprint-word pprint-word
] with each block> block> ;
M: predicate-class see-class*
<colon \ PREDICATE: pprint-word
dup pprint-word
"<" text
dup superclass pprint-word
<block
"predicate-definition" word-prop pprint-elements
pprint-; block> block> ;
M: singleton-class see-class* ( class -- )
\ SINGLETON: pprint-word pprint-word ;
GENERIC: pprint-slot-name ( object -- )
M: string pprint-slot-name text ;
M: array pprint-slot-name
<flow \ { pprint-word
f <inset unclip text pprint-elements block>
\ } pprint-word block> ;
: unparse-slot ( slot-spec -- array )
[
dup name>> ,
dup class>> object eq? [
dup class>> ,
initial: ,
dup initial>> ,
] unless
dup read-only>> [
read-only ,
] when
drop
] { } make ;
: pprint-slot ( slot-spec -- )
unparse-slot
dup length 1 = [ first ] when
pprint-slot-name ;
M: tuple-class see-class*
<colon \ TUPLE: pprint-word
dup pprint-word
dup superclass tuple eq? [
"<" text dup superclass pprint-word
] unless
<block "slots" word-prop [ pprint-slot ] each block>
pprint-; block> ;
M: word see-class* drop ;
M: builtin-class see-class*
drop "! Built-in class" comment. ;
: see-class ( class -- )
dup class? [
[
[ seeing-word ] [ see-class* ] bi
] with-use
] [ drop ] if ;
M: word see*
[ see-class ]
[ [ class? ] [ symbol? not ] bi and [ nl nl ] when ]
[
dup [ class? ] [ symbol? ] bi and
[ drop ] [ call-next-method ] if
] tri ;
: seeing-implementors ( class -- seq )
dup implementors [ method ] with map natural-sort ;
: seeing-methods ( generic -- seq )
"methods" word-prop values natural-sort ;
PRIVATE>
: see-all ( seq -- )
natural-sort [ nl nl ] [ see* ] interleave ;
: methods ( word -- seq )
[
dup class? [ dup seeing-implementors % ] when
dup generic? [ dup seeing-methods % ] when
drop
] { } make prune ;
: see-methods ( word -- )
methods see-all nl ;

1
basis/see/summary.txt Normal file
View File

@ -0,0 +1 @@
Printing loaded definitions as source code

View File

@ -7,10 +7,13 @@ IN: simple-flat-file
[ "#" split1 drop ] map harvest ;
: split-column ( line -- columns )
" \t" split harvest 2 head ;
" \t" split harvest 2 short head 2 f pad-tail ;
: parse-hex ( s -- n )
2 short tail hex> ;
dup [
"0x" ?head [ "U+" ?head [ "Missing 0x or U+" throw ] unless ] unless
hex>
] when ;
: parse-line ( line -- code-unicode )
split-column [ parse-hex ] map ;

View File

@ -155,7 +155,7 @@ M: object apply-object push-literal ;
"cannot-infer" word-prop rethrow ;
: maybe-cannot-infer ( word quot -- )
[ [ "cannot-infer" set-word-prop ] keep throw ] recover ; inline
[ [ "cannot-infer" set-word-prop ] keep rethrow ] recover ; inline
: infer-word ( word -- effect )
[

View File

@ -7,7 +7,7 @@ sorting assocs definitions prettyprint io inspector
classes.tuple classes.union classes.predicate debugger
threads.private io.streams.string io.timeouts io.thread
sequences.private destructors combinators eval locals.backend
system ;
system compiler.units ;
IN: stack-checker.tests
\ infer. must-infer
@ -580,4 +580,11 @@ DEFER: eee'
[ [ ] debugging-curry-folding ] must-infer
[ [ exit ] [ 1 2 3 ] if ] must-infer
[ [ exit ] [ 1 2 3 ] if ] must-infer
! Stack effects are required now but FORGET: clears them...
: forget-test ( -- ) ;
[ forget-test ] must-infer
[ ] [ [ \ forget-test forget ] with-compilation-unit ] unit-test
[ forget-test ] must-infer

View File

@ -10,10 +10,11 @@ stack-checker.recursive-state ;
IN: stack-checker.transforms
: give-up-transform ( word -- )
dup recursive-word?
[ call-recursive-word ]
[ dup infer-word apply-word/effect ]
if ;
{
{ [ dup "inferred-effect" word-prop ] [ cached-infer ] }
{ [ dup recursive-word? ] [ call-recursive-word ] }
[ dup infer-word apply-word/effect ]
} cond ;
:: ((apply-transform)) ( word quot values stack rstate -- )
rstate recursive-state

Some files were not shown because too many files have changed in this diff Show More