Commit Graph

24 Commits (01b5430fbfb29aa6000ed90080c4328f127d0b23)

Author SHA1 Message Date
Slava Pestov 4d5a4222b6 More SIMD work
- Rename SIMD types and register representations: <type>-<count> rather than <count><type>-array
- Make a functor to define 256-bit vector types, use it to define float-8 type
- Make SIMD instructions pure-insns so that they participate in value numbering
2009-09-03 20:58:56 -05:00
Slava Pestov ff8c70dbe0 Initial implementation of SSE vector intrinsics:
- cpu.architecture: add SSE vector representations
- compiler.cfg.intrinsics.alien: remove an attempt at optimization that value numbering handles now
- compiler.cfg.representations: support instructions where the representation is set in the 'rep' slot, and support conversions between single and double floats
- alien-float, set-alien-float now use the single float representation, and the conversion is implicit; this fixes a long-standing bug where a register could get clobbered because of how %set-alien-float was defined on x86
- math.vectors.specialization: add support for SIMD specialization (where the vector word's body is replaced by another quotation), also specialize the 'sum' word
- math.vectors.simd: 4float-array, 2double-array, 4double-array types, and specializers for the math.vectors words
2009-09-03 02:33:07 -05:00
Slava Pestov 85a2bfab6c compiler: eliminate boilerplate by centralizing info in declarative INSN: syntax 2009-09-02 06:22:37 -05:00
Slava Pestov 2bb6293217 compiler: add fixnum-min/max intrinsics; ~10% speedup on benchmark.yuv-to-rgb 2009-08-28 19:02:59 -05:00
Slava Pestov d957ae4e44 Performance improvements to make struct-arrays benchmark faster
- improved optimization of ##unbox-any-c-ptr on ##box-displaced-alien; convert it to ##unbox-c-ptr where possible using class info stored in the ##bda instruction
- make fcos, fsin, etc inline again; everything in math.libm inline again, except for fsqrt which is an intrinsic
- convert min and max on floats to float-min and float-max
- make min and max not inline, so that the above can work
- struct-arrays: rice a bit so that more fixnums come up
2009-08-28 05:21:16 -05:00
Doug Coleman 3f3d57032b Delete empty unit tests files, remove 1- and 1+, reorder IN: lines in a lot of places, minor refactoring 2009-08-13 19:21:44 -05:00
Slava Pestov 24a50c8006 compiler.cfg.two-operand: sometimes we can eliminate a copy in the x = y <op> y case 2009-08-08 20:03:42 -05:00
Slava Pestov 4b7ba38aab compiler.cfg: virtual registers are integers now, and representations are stored off to the side. Fix bug in representation selection that would manifest if a value was used as a float and a fixnum in different branches; cannot globally unbox float in this case 2009-08-08 04:02:18 -05:00
Slava Pestov 725280d424 Split off the notion of a register representation from a register class 2009-08-07 17:44:50 -05:00
Slava Pestov 370f4c081d compiler.cfg: convert code into two-operand form before SSA destruction; SSA destruction now operates on a relaxed SSA form where multiple defs of the same vreg are allowed, but only within a single basic block. This makes linear scan's coalescing redundant, allowing it to be removed completely 2009-08-05 18:57:46 -05:00
Slava Pestov c1c8424605 Compiler speedups 2009-08-02 09:16:21 -05:00
Slava Pestov 9bde92220b compiler.cfg.two-operand: if last instruction in a basic block is an overflowing arithmetic op of the form x = y op x, we now convert it correctly. This fixes compiler regression with benchmark.dawes after recent coalescing changes 2009-08-01 23:50:47 -05:00
Slava Pestov e8cf50ac3e compiler.cfg.two-operand: make it work in more cases 2009-07-27 22:28:29 -05:00
Slava Pestov 21a012e3d7 compiler.cfg: Major restructuring -- do not compute liveness before local optimization, and instead change local optimizations to be more permissive of undefined values. Now, liveness is only computed once, after phi elimination and before register allocation. This means liveness analysis does not need to take phi nodes into account and can now use the new compiler.cfg.dataflow-analysis framework 2009-07-22 03:08:28 -05:00
Slava Pestov e88e7f70be Merge branch 'master' of git://factorcode.org/git/factor 2009-07-17 00:03:13 -05:00
Slava Pestov 3fb4fc1bde Improve code generation for shift word: add intrinsics for fixnum-shift-fast in the case where the shift count is not constant, transform 1 swap shift into a more overflow check with open-coded fast case, transform bitand into fixnum-bitand in more cases 2009-07-16 23:50:48 -05:00
Daniel Ehrenberg 8ea2996438 Removing two unused words in compiler.cfg.two-operand 2009-07-16 22:59:38 -05:00
Slava Pestov e76dce8aff Overflowing fixnum intrinsics now expand into several CFG nodes. This speeds up the common case since only the uncommon case is now a stack syncpoint 2009-07-16 18:29:40 -05:00
Slava Pestov 45a2105449 cpu.x86.assembler: IMUL2 instruction was busted for immediate operands
When given a register and an immediate, it would generate imul imm,dst,dst however the 64-bit prefix was generated wrong and if dst was an extended register only the first operand would be an extended register. To fix this, change IMUL2 to not work on immediates anymore, and added a new IMUL3 that takes a destination register, source register, and immediate. Also, change compiler.cfg.two-operand to not two-operandize %mul-imm, since this isn't needed anymore.
This fixes the sporadic benchmark.tuple-arrays crash on 64-bit machines.
2009-06-08 21:15:52 -05:00
Slava Pestov 3a9922d161 Fix compiler errors 2009-06-01 03:00:10 -05:00
Slava Pestov e04df76f60 Various codegen improvements:
- new-insn word to construct instructions
- cache RPO in the CFG
- re-organize low-level optimizer so that MR is built after register allocation
- register allocation now stores instruction numbers in the instructions themselves
- split defs-vregs into defs-vregs and temp-vregs
2009-05-29 13:11:34 -05:00
Slava Pestov 6b25e99470 Add summary for heaps more vocabs 2009-02-16 21:05:13 -06:00
Slava Pestov 145b635eb6 More optimization intended to reduce compile time. Another 10% speedup on compiling empty PEG parser
- new map-flat combinator replaces usages of 'map flatten' in compiler
- compiler.tree.def-use.simplified uses an explicit accumulator instead of flatten
- compiler.tree.tuple-unboxing uses an explicit accumulator instead of flatten
- fix inlining regression from last time: custom inlining results would sometimes be discarded
- compiler.tree's 3each and 3map combinators rewritten to not use flip
- rewrite math.partial-dispatch without locals (purely stylistic, no performance increase)
- hand-optimize flip for common arrays-of-arrays case
- don't run escape analysis and tuple unboxing if there are no allocations in the IR
2008-12-06 11:17:19 -06:00
Slava Pestov db4db19cd9 Start working on coalescing 2008-10-28 02:38:37 -07:00