Slava Pestov
e36a0d7ef4
compiler: clean up code generation for alien boxing/unboxing a bit
2009-09-03 21:22:43 -05:00
Slava Pestov
4d5a4222b6
More SIMD work
...
- Rename SIMD types and register representations: <type>-<count> rather than <count><type>-array
- Make a functor to define 256-bit vector types, use it to define float-8 type
- Make SIMD instructions pure-insns so that they participate in value numbering
2009-09-03 20:58:56 -05:00
Slava Pestov
bf81cb4259
math.vectors.simd: split off intrinsics into a sub-vocabulary, to avoid loading most of the SIMD code on bootstrap
2009-09-03 03:43:43 -05:00
Slava Pestov
ff8c70dbe0
Initial implementation of SSE vector intrinsics:
...
- cpu.architecture: add SSE vector representations
- compiler.cfg.intrinsics.alien: remove an attempt at optimization that value numbering handles now
- compiler.cfg.representations: support instructions where the representation is set in the 'rep' slot, and support conversions between single and double floats
- alien-float, set-alien-float now use the single float representation, and the conversion is implicit; this fixes a long-standing bug where a register could get clobbered because of how %set-alien-float was defined on x86
- math.vectors.specialization: add support for SIMD specialization (where the vector word's body is replaced by another quotation), also specialize the 'sum' word
- math.vectors.simd: 4float-array, 2double-array, 4double-array types, and specializers for the math.vectors words
2009-09-03 02:33:07 -05:00
Slava Pestov
85a2bfab6c
compiler: eliminate boilerplate by centralizing info in declarative INSN: syntax
2009-09-02 06:22:37 -05:00
Slava Pestov
f91b539c31
cpu.ppc: implement fast float function calls; 3x speedup on benchmark.struct-arrays on PowerPC
2009-09-01 15:19:26 -05:00
Slava Pestov
868009aaee
compiler.cfg.intrinsics: cleanup: the "intrinsic" word property is now a quotation, not a boolean, making this mechanism more extensible
2009-08-30 22:20:49 -05:00
Slava Pestov
447c5fbf7a
compiler.cfg.linear-scan.live-intervals: dead-value-error is never thrown anymore
2009-08-30 05:15:18 -05:00
Slava Pestov
9595be4bf9
%box-displaced-alien: fix clobberage found by Doug
2009-08-30 05:11:08 -05:00
Slava Pestov
0db01f6d5f
compiler.cfg.linear-scan now supports partial sync-points where all registers are spilled; taking advantage of this, there are new trigonometric intrinsics which yield a 2x performance boost on benchmark.struct-arrays and a 25% boost on benchmark.partial-sums
2009-08-30 04:52:01 -05:00
Slava Pestov
908b4742c5
compiler.cfg.value-numbering: fix ##box-displaced-alien simplification
2009-08-28 19:05:49 -05:00
Slava Pestov
2bb6293217
compiler: add fixnum-min/max intrinsics; ~10% speedup on benchmark.yuv-to-rgb
2009-08-28 19:02:59 -05:00
Slava Pestov
d957ae4e44
Performance improvements to make struct-arrays benchmark faster
...
- improved optimization of ##unbox-any-c-ptr on ##box-displaced-alien; convert it to ##unbox-c-ptr where possible using class info stored in the ##bda instruction
- make fcos, fsin, etc inline again; everything in math.libm inline again, except for fsqrt which is an intrinsic
- convert min and max on floats to float-min and float-max
- make min and max not inline, so that the above can work
- struct-arrays: rice a bit so that more fixnums come up
2009-08-28 05:21:16 -05:00
Slava Pestov
8f19f14c1f
compiler.cfg.instructions: forgot that ##box-displaced-alien needs a GC check; fixes segfault in benchmark.mandel
2009-08-27 04:09:35 -05:00
Slava Pestov
f662e6403a
compiler: new inline intrinsic for <displaced-alien> where the inputs have known types; value numbering now eliminates unnecessary allocation of displaced aliens if the result is immediately unboxed again
2009-08-27 00:06:19 -05:00
Slava Pestov
8bf709acd0
compiler.cfg.linear-scan: fix unit tests for new fake-representations
2009-08-26 08:58:00 -05:00
Slava Pestov
8d55616d34
compiler.cfg.debugger: fix fake-representations so that low-level-ir tests can pass on x86
2009-08-25 23:44:01 -05:00
Slava Pestov
0df8aadce2
cpu.x86: use SQRTSD instruction for math.libm:fsqrt word
2009-08-25 23:22:15 -05:00
Slava Pestov
81b72cb5c5
Add some unit tests
2009-08-22 17:15:10 -05:00
Slava Pestov
5197aca215
compiler.cfg.dataflow-analysis: when intersecting sets, treat uninitialized sets as universal rather than empty; reduces number of stack instructions generated by 1%
2009-08-20 18:15:41 -05:00
Slava Pestov
fd2f0a602d
compiler.cfg.stacks.local: more accurate local replace set computation; optimizes out 'swap swap'
2009-08-19 22:00:21 -05:00
Slava Pestov
a598cc35a5
compiler: add unit tests for new bugs
2009-08-19 16:56:26 -05:00
Daniel Ehrenberg
478b960560
Merge branch 'master' of git://factorcode.org/git/factor
2009-08-14 20:11:54 -05:00
Daniel Ehrenberg
3cec74867d
Improving write barrier elimination; change to compiler.cfg.utilities to support this
2009-08-14 19:41:41 -05:00
Daniel Ehrenberg
8197d9356b
Write barriers are hoisted out of loops when their target is slot-available
2009-08-13 20:26:44 -05:00
Doug Coleman
2ed4425b7a
Merge branch 'master' of git://factorcode.org/git/factor
...
Conflicts:
basis/calendar/calendar.factor
2009-08-13 19:40:02 -05:00
Doug Coleman
3f3d57032b
Delete empty unit tests files, remove 1- and 1+, reorder IN: lines in a lot of places, minor refactoring
2009-08-13 19:21:44 -05:00
Daniel Ehrenberg
5a3e350490
Global write barrier elimination tracks newly allocated objects
2009-08-13 15:18:47 -05:00
Daniel Ehrenberg
d35e1eb76c
Fixing write-barrier elimination; adding bb as a parameter to join-sets in dataflow analysis
2009-08-12 23:52:29 -05:00
Daniel Ehrenberg
1a7ab59f56
Making write barrier elimination global
2009-08-11 21:21:21 -05:00
Slava Pestov
8a9c15ab0b
compiler.tree.escape-analysis: if the output of an #introduce node has an immutable tuple class type declaration, and it is not passed to any subroutine calls, or returned from the word, then unbox it. This speeds up vector arithmetic words on specialized arrays, because the specialized array is unboxed up-front, eliminating an indirection on every loop iteration
2009-08-09 16:29:21 -05:00
Slava Pestov
cc5476c823
_gc instruction doesn't need slot to hold GC root area size, since that's just tagged-values>> length
2009-08-09 03:08:13 -05:00
Slava Pestov
687454878a
compiler.cfg.linearization: change order to fit older unit tests
2009-08-08 23:06:57 -05:00
Slava Pestov
24a50c8006
compiler.cfg.two-operand: sometimes we can eliminate a copy in the x = y <op> y case
2009-08-08 20:03:42 -05:00
Slava Pestov
55acddef3f
compiler.cfg.representation: OK to unbox output of ##load-reference globally
2009-08-08 20:03:13 -05:00
Slava Pestov
d0c393aa60
compiler.cfg: new system to track when results of analyses need to be recomputed (reverse post order, linear order, predecessors, dominance, loops). Passes can now call needs-predecessors, needs-dominance, needs-loops at the beginning, and cfg-changed, predecessors-changd at the end. Linearization order now takes loop nesting into account, and linear scan now uses linearization order instead of RPO.
2009-08-08 20:02:56 -05:00
Slava Pestov
11dc0a23a8
compiler.cfg.ssa.liveness: fix tests
2009-08-08 16:15:45 -05:00
Slava Pestov
1bf8a0cac7
compiler.cfg.representations: emit-conversion should not be private since CSSA construction uses it
2009-08-08 04:13:30 -05:00
Slava Pestov
4b7ba38aab
compiler.cfg: virtual registers are integers now, and representations are stored off to the side. Fix bug in representation selection that would manifest if a value was used as a float and a fixnum in different branches; cannot globally unbox float in this case
2009-08-08 04:02:18 -05:00
Slava Pestov
e21ca289c3
compiler.cfg.representations: new pass to make global unboxing decisions, relies on new compiler.cfg.loop-detection pass for loop nesting information
2009-08-08 00:24:46 -05:00
Slava Pestov
725280d424
Split off the notion of a register representation from a register class
2009-08-07 17:44:50 -05:00
Slava Pestov
370f4c081d
compiler.cfg: convert code into two-operand form before SSA destruction; SSA destruction now operates on a relaxed SSA form where multiple defs of the same vreg are allowed, but only within a single basic block. This makes linear scan's coalescing redundant, allowing it to be removed completely
2009-08-05 18:57:46 -05:00
Slava Pestov
b49e4c9c9b
Merge branch 'master' of git://factorcode.org/git/factor
2009-08-03 10:31:27 -05:00
Slava Pestov
d286a7f426
compiler.cfg.critical-edges: no longer neededed
2009-08-03 10:31:00 -05:00
Slava Pestov
d20d335447
compiler.cfg.stacks: more accurate deconcatenatization inserts fewer partially redundant ##peeks. 11% improvement on benchmark.beust2, 2% reduction in ##peek and ##replace instructions inserted
2009-08-03 07:08:28 -05:00
Slava Pestov
720bfe378f
compiler.cfg.stacks.uninitialized: use bitand instead of min
2009-08-03 06:03:38 -05:00
Joe Groff
63e5b0d6d8
Merge branch 'master' of git://factorcode.org/git/factor
2009-08-02 23:16:52 -05:00
Joe Groff
7c5ef08aab
[ [ ... ] compare ] sort => [ ... ] sort-with
2009-08-02 20:09:23 -05:00
Slava Pestov
dac7edd2ab
compiler.cfg.def-use: remove compute-def-use word, passes have to call compute-defs or compute-uses now; compiler.cfg.ssa.liveness: don't compute dominance and def-use first since destruction does already
2009-08-02 19:12:32 -05:00
Slava Pestov
df6c87d350
Merge branch 'master' of git://factorcode.org/git/factor
2009-08-02 18:46:27 -05:00