Commit Graph

456 Commits (1fc809b64393ff84fc8ff926fb6dfc9ad7402c30)

Author SHA1 Message Date
Slava Pestov e36a0d7ef4 compiler: clean up code generation for alien boxing/unboxing a bit 2009-09-03 21:22:43 -05:00
Slava Pestov 4d5a4222b6 More SIMD work
- Rename SIMD types and register representations: <type>-<count> rather than <count><type>-array
- Make a functor to define 256-bit vector types, use it to define float-8 type
- Make SIMD instructions pure-insns so that they participate in value numbering
2009-09-03 20:58:56 -05:00
Slava Pestov bf81cb4259 math.vectors.simd: split off intrinsics into a sub-vocabulary, to avoid loading most of the SIMD code on bootstrap 2009-09-03 03:43:43 -05:00
Slava Pestov ff8c70dbe0 Initial implementation of SSE vector intrinsics:
- cpu.architecture: add SSE vector representations
- compiler.cfg.intrinsics.alien: remove an attempt at optimization that value numbering handles now
- compiler.cfg.representations: support instructions where the representation is set in the 'rep' slot, and support conversions between single and double floats
- alien-float, set-alien-float now use the single float representation, and the conversion is implicit; this fixes a long-standing bug where a register could get clobbered because of how %set-alien-float was defined on x86
- math.vectors.specialization: add support for SIMD specialization (where the vector word's body is replaced by another quotation), also specialize the 'sum' word
- math.vectors.simd: 4float-array, 2double-array, 4double-array types, and specializers for the math.vectors words
2009-09-03 02:33:07 -05:00
Slava Pestov 85a2bfab6c compiler: eliminate boilerplate by centralizing info in declarative INSN: syntax 2009-09-02 06:22:37 -05:00
Slava Pestov f91b539c31 cpu.ppc: implement fast float function calls; 3x speedup on benchmark.struct-arrays on PowerPC 2009-09-01 15:19:26 -05:00
Slava Pestov 868009aaee compiler.cfg.intrinsics: cleanup: the "intrinsic" word property is now a quotation, not a boolean, making this mechanism more extensible 2009-08-30 22:20:49 -05:00
Slava Pestov 447c5fbf7a compiler.cfg.linear-scan.live-intervals: dead-value-error is never thrown anymore 2009-08-30 05:15:18 -05:00
Slava Pestov 9595be4bf9 %box-displaced-alien: fix clobberage found by Doug 2009-08-30 05:11:08 -05:00
Slava Pestov 0db01f6d5f compiler.cfg.linear-scan now supports partial sync-points where all registers are spilled; taking advantage of this, there are new trigonometric intrinsics which yield a 2x performance boost on benchmark.struct-arrays and a 25% boost on benchmark.partial-sums 2009-08-30 04:52:01 -05:00
Slava Pestov 908b4742c5 compiler.cfg.value-numbering: fix ##box-displaced-alien simplification 2009-08-28 19:05:49 -05:00
Slava Pestov 2bb6293217 compiler: add fixnum-min/max intrinsics; ~10% speedup on benchmark.yuv-to-rgb 2009-08-28 19:02:59 -05:00
Slava Pestov d957ae4e44 Performance improvements to make struct-arrays benchmark faster
- improved optimization of ##unbox-any-c-ptr on ##box-displaced-alien; convert it to ##unbox-c-ptr where possible using class info stored in the ##bda instruction
- make fcos, fsin, etc inline again; everything in math.libm inline again, except for fsqrt which is an intrinsic
- convert min and max on floats to float-min and float-max
- make min and max not inline, so that the above can work
- struct-arrays: rice a bit so that more fixnums come up
2009-08-28 05:21:16 -05:00
Slava Pestov 8f19f14c1f compiler.cfg.instructions: forgot that ##box-displaced-alien needs a GC check; fixes segfault in benchmark.mandel 2009-08-27 04:09:35 -05:00
Slava Pestov f662e6403a compiler: new inline intrinsic for <displaced-alien> where the inputs have known types; value numbering now eliminates unnecessary allocation of displaced aliens if the result is immediately unboxed again 2009-08-27 00:06:19 -05:00
Slava Pestov 8bf709acd0 compiler.cfg.linear-scan: fix unit tests for new fake-representations 2009-08-26 08:58:00 -05:00
Slava Pestov 8d55616d34 compiler.cfg.debugger: fix fake-representations so that low-level-ir tests can pass on x86 2009-08-25 23:44:01 -05:00
Slava Pestov 0df8aadce2 cpu.x86: use SQRTSD instruction for math.libm:fsqrt word 2009-08-25 23:22:15 -05:00
Slava Pestov 81b72cb5c5 Add some unit tests 2009-08-22 17:15:10 -05:00
Slava Pestov 5197aca215 compiler.cfg.dataflow-analysis: when intersecting sets, treat uninitialized sets as universal rather than empty; reduces number of stack instructions generated by 1% 2009-08-20 18:15:41 -05:00
Slava Pestov fd2f0a602d compiler.cfg.stacks.local: more accurate local replace set computation; optimizes out 'swap swap' 2009-08-19 22:00:21 -05:00
Slava Pestov a598cc35a5 compiler: add unit tests for new bugs 2009-08-19 16:56:26 -05:00
Daniel Ehrenberg 478b960560 Merge branch 'master' of git://factorcode.org/git/factor 2009-08-14 20:11:54 -05:00
Daniel Ehrenberg 3cec74867d Improving write barrier elimination; change to compiler.cfg.utilities to support this 2009-08-14 19:41:41 -05:00
Daniel Ehrenberg 8197d9356b Write barriers are hoisted out of loops when their target is slot-available 2009-08-13 20:26:44 -05:00
Doug Coleman 2ed4425b7a Merge branch 'master' of git://factorcode.org/git/factor
Conflicts:
	basis/calendar/calendar.factor
2009-08-13 19:40:02 -05:00
Doug Coleman 3f3d57032b Delete empty unit tests files, remove 1- and 1+, reorder IN: lines in a lot of places, minor refactoring 2009-08-13 19:21:44 -05:00
Daniel Ehrenberg 5a3e350490 Global write barrier elimination tracks newly allocated objects 2009-08-13 15:18:47 -05:00
Daniel Ehrenberg d35e1eb76c Fixing write-barrier elimination; adding bb as a parameter to join-sets in dataflow analysis 2009-08-12 23:52:29 -05:00
Daniel Ehrenberg 1a7ab59f56 Making write barrier elimination global 2009-08-11 21:21:21 -05:00
Slava Pestov 8a9c15ab0b compiler.tree.escape-analysis: if the output of an #introduce node has an immutable tuple class type declaration, and it is not passed to any subroutine calls, or returned from the word, then unbox it. This speeds up vector arithmetic words on specialized arrays, because the specialized array is unboxed up-front, eliminating an indirection on every loop iteration 2009-08-09 16:29:21 -05:00
Slava Pestov cc5476c823 _gc instruction doesn't need slot to hold GC root area size, since that's just tagged-values>> length 2009-08-09 03:08:13 -05:00
Slava Pestov 687454878a compiler.cfg.linearization: change order to fit older unit tests 2009-08-08 23:06:57 -05:00
Slava Pestov 24a50c8006 compiler.cfg.two-operand: sometimes we can eliminate a copy in the x = y <op> y case 2009-08-08 20:03:42 -05:00
Slava Pestov 55acddef3f compiler.cfg.representation: OK to unbox output of ##load-reference globally 2009-08-08 20:03:13 -05:00
Slava Pestov d0c393aa60 compiler.cfg: new system to track when results of analyses need to be recomputed (reverse post order, linear order, predecessors, dominance, loops). Passes can now call needs-predecessors, needs-dominance, needs-loops at the beginning, and cfg-changed, predecessors-changd at the end. Linearization order now takes loop nesting into account, and linear scan now uses linearization order instead of RPO. 2009-08-08 20:02:56 -05:00
Slava Pestov 11dc0a23a8 compiler.cfg.ssa.liveness: fix tests 2009-08-08 16:15:45 -05:00
Slava Pestov 1bf8a0cac7 compiler.cfg.representations: emit-conversion should not be private since CSSA construction uses it 2009-08-08 04:13:30 -05:00
Slava Pestov 4b7ba38aab compiler.cfg: virtual registers are integers now, and representations are stored off to the side. Fix bug in representation selection that would manifest if a value was used as a float and a fixnum in different branches; cannot globally unbox float in this case 2009-08-08 04:02:18 -05:00
Slava Pestov e21ca289c3 compiler.cfg.representations: new pass to make global unboxing decisions, relies on new compiler.cfg.loop-detection pass for loop nesting information 2009-08-08 00:24:46 -05:00
Slava Pestov 725280d424 Split off the notion of a register representation from a register class 2009-08-07 17:44:50 -05:00
Slava Pestov 370f4c081d compiler.cfg: convert code into two-operand form before SSA destruction; SSA destruction now operates on a relaxed SSA form where multiple defs of the same vreg are allowed, but only within a single basic block. This makes linear scan's coalescing redundant, allowing it to be removed completely 2009-08-05 18:57:46 -05:00
Slava Pestov b49e4c9c9b Merge branch 'master' of git://factorcode.org/git/factor 2009-08-03 10:31:27 -05:00
Slava Pestov d286a7f426 compiler.cfg.critical-edges: no longer neededed 2009-08-03 10:31:00 -05:00
Slava Pestov d20d335447 compiler.cfg.stacks: more accurate deconcatenatization inserts fewer partially redundant ##peeks. 11% improvement on benchmark.beust2, 2% reduction in ##peek and ##replace instructions inserted 2009-08-03 07:08:28 -05:00
Slava Pestov 720bfe378f compiler.cfg.stacks.uninitialized: use bitand instead of min 2009-08-03 06:03:38 -05:00
Joe Groff 63e5b0d6d8 Merge branch 'master' of git://factorcode.org/git/factor 2009-08-02 23:16:52 -05:00
Joe Groff 7c5ef08aab [ [ ... ] compare ] sort => [ ... ] sort-with 2009-08-02 20:09:23 -05:00
Slava Pestov dac7edd2ab compiler.cfg.def-use: remove compute-def-use word, passes have to call compute-defs or compute-uses now; compiler.cfg.ssa.liveness: don't compute dominance and def-use first since destruction does already 2009-08-02 19:12:32 -05:00
Slava Pestov df6c87d350 Merge branch 'master' of git://factorcode.org/git/factor 2009-08-02 18:46:27 -05:00