factor

Commit Graph

Author	SHA1	Message	Date
Joe Groff	a7a77cd03e	fix x86 uchar %scalar>integer	2009-10-10 10:39:23 -05:00
Joe Groff	5158a12d32	rename ##shuffle-vector to ##shuffle-vector-imm, and add a new ##shuffle-vector for dynamic shuffles. have vshuffle use ##shuffle-vector to do word and byte shuffles on x86	2009-10-09 21:26:27 -05:00
Joe Groff	98836a9e2e	break vector compare intrinsics into %compare, %or, and %not instructions that map directly to cpu instructions	2009-10-07 15:27:03 -05:00
Joe Groff	43b51ef2eb	decompose %unpack-vector-head/tail into %compare-vector/%merge-vector-head/tail or %tail>head-vector/%unpack-vector-head insns when there isn't an actual unpack insn; get rid of fake x86 implementations	2009-10-07 14:09:46 -05:00
Joe Groff	5152c3b06d	sse doesn't actually have an unsigned->unsigned pack instruction	2009-10-07 12:00:31 -05:00
Joe Groff	a13e75f4f4	don't generate a ##not-vector instruction if the cpu doesn't have one; instead, fall back to a ##fill-vector/##xor-vector combo. get rid of pretend %not-vector in cpu.x86	2009-10-07 11:59:36 -05:00
Joe Groff	444624e79f	fix x86 %unpack-vector insns	2009-10-06 20:38:51 -05:00
Joe Groff	2edccca0bb	oops...PACKUSDW is sse4 only	2009-10-06 20:09:50 -05:00
Joe Groff	425ea05529	%float>integer-vector should truncate	2009-10-06 13:57:54 -05:00
Joe Groff	84ecb1266d	add insns for vector pack, unpack, integer>float, and float>integer	2009-10-05 22:34:14 -05:00
Slava Pestov	931107397c	compiler.cfg: remove _gc instruction, it doesn't need to exist, and change GC checks to ensure that the right amount of space is available instead of blindly checking for 1Kb	2009-10-05 05:27:49 -05:00
Joe Groff	dca9d3e535	add %merge-vector-head and %merge-vector-tail instructions to back vmerge	2009-10-03 21:48:53 -05:00
Joe Groff	335df20713	add intrinsics for v<=, v<, v=, v>, v>=, vunordered?	2009-10-03 11:29:34 -05:00
Joe Groff	b1ec36a324	extend x86 %compare-vector to cover all comparison codes, sometimes stupidly for now	2009-10-02 23:19:56 -05:00
Joe Groff	e2e75c6b3a	add intrinsic for vnot/vbitnot	2009-10-02 20:04:28 -05:00
Joe Groff	9d424a1092	Merge branch 'master' of git://factorcode.org/git/factor Conflicts: basis/compiler/codegen/codegen.factor	2009-10-01 23:14:16 -05:00
Joe Groff	7b13fa4283	fold test-vector/branch sequences into a test-vector-branch instruction	2009-10-01 19:53:30 -05:00
Joe Groff	228ad950bb	%test-vector instruction for vany?, vall?, vnone?	2009-10-01 15:35:38 -05:00
Joe Groff	94070c11aa	%compare-vector instruction (only does v= for now)	2009-10-01 14:31:37 -05:00
Joe Groff	3ba79be651	Revert "add a %blend-vector intrinsic for v?" This reverts commit `21e4b28b67`.	2009-09-30 23:40:37 -05:00
Joe Groff	37a091a188	Merge branch 'master' of git://factorcode.org/git/factor	2009-09-30 23:04:04 -05:00
Joe Groff	21e4b28b67	add a %blend-vector intrinsic for v?	2009-09-30 23:03:59 -05:00
Slava Pestov	65421b111b	math.vectors.simd: use fallbacks for hlshift, hrshift, vshuffle if parameter is not a literal;al; element access in int-4 on x86-64 now sign-extends the value; don't throw error at compile time if parameter for vshuffle does not have enough elements	2009-09-30 20:04:37 -05:00
Slava Pestov	8e201ca4b7	Various minor compiler tweaks: Combine address calculation with dereferencing in alien accessors; convert SIMD XOR of a vector with itself into an XOR of the destination with itself; convert SIMD unbox of zero vector into XOR of the destination with itself; fix SIMD indexing on x86-64	2009-09-30 05:00:36 -05:00
Slava Pestov	2b13245704	math.vectors.simd: add fast intrinsic for 'nth', replace broadcast primitive with shuffles	2009-09-29 04:48:11 -05:00
Slava Pestov	a6e8277b2c	math.vectors.simd: add vshuffle intrinsic	2009-09-28 23:12:13 -05:00
Slava Pestov	db217295b0	Work in progress	2009-09-28 17:31:34 -05:00
Slava Pestov	49dba53760	cpu.x86: cleanups	2009-09-28 16:38:35 -05:00
Joe Groff	4e2e45b70d	use PSHUFD for longlong-2 broadcast when dst != src to avoid a %copy	2009-09-28 12:04:08 -05:00
Joe Groff	3f90473f09	use MOVDDUP for double-2 broadcast to eliminate a %copy	2009-09-28 12:00:03 -05:00
Joe Groff	467c389948	cpu.x86.assembler: make SSE shuffle instructions accept an array of indexes so they're easier to use	2009-09-28 11:45:45 -05:00
Joe Groff	f7d416a4e4	SSE integer gather and broadcast	2009-09-28 11:24:08 -05:00
Slava Pestov	f08521bf83	Fixing various test failures caused by C type parser change, and clarify C type docs some more	2009-09-28 08:48:39 -05:00
Slava Pestov	1109fb5725	math.vectors.simd: add intrinsic for int-4-boa, uint-4-boa, fix tests for C type parser change, fix software fallback for horizontal shifts	2009-09-28 06:34:22 -05:00
Slava Pestov	dc1b6043dc	cpu.x86: shifts didn't work if dst != src1; re-organize file a bit	2009-09-28 05:39:53 -05:00
Slava Pestov	daf8f0ebba	cpu.x86: fix regression: fsqrt intrinsic wasn't used	2009-09-28 02:27:55 -05:00
Slava Pestov	10c5fe5933	math.vectors.simd: add hlshift, hrshift (128-bit shift), vbitandn intrinsics	2009-09-28 02:17:46 -05:00
Slava Pestov	e8cfaccef0	compiler.cfg: nuke ##bignum>integer and ##integer>bignum since they were unused	2009-09-27 20:36:05 -05:00
Slava Pestov	6dd8e4657e	Merge branch 'master' into more_aggressive_coalescing	2009-09-27 19:29:50 -05:00
Slava Pestov	6f2a4eba51	compiler.cfg.linear-scan: fix partial sync point logic in case where dst == src, and clean up spilling code	2009-09-27 19:28:20 -05:00
Slava Pestov	2efab6efad	cpu.x86.32: implement %unary-float-function and %binary-float-function; speeds up partial-sums and struct-arrays benchmarks	2009-09-27 18:06:30 -05:00
Slava Pestov	a267100781	compiler.cfg.ssa.destruction: more aggressive coalescing work in progress	2009-09-27 17:17:26 -05:00
sheeple	2b35f52ed2	Merge branch 'slots' of git://factorcode.org/git/factor into slots Conflicts: basis/cpu/x86/x86.factor	2009-09-26 03:12:42 -05:00
Daniel Ehrenberg	fb7f6ab455	Making ##slot and ##set-slot not have a temporary parameter	2009-09-26 00:28:14 -05:00
Phil Dawes	baa41f451f	removed param-reg-* HOOKs	2009-09-25 18:58:55 +01:00
Phil Dawes	aa71248937	made inline_gc a VM_C_API function	2009-09-25 18:29:07 +01:00
Phil Dawes	8b005f5b1d	make inline_gc regparm(3) and cleaned up %call-gc stack alignment	2009-09-24 21:45:56 +01:00
Slava Pestov	1b30310a35	cpu.x86: don't generate SSE2 instructions if only SSE1 is available	2009-09-24 04:07:15 -05:00
Slava Pestov	24039cb56a	math.vectors.simd: add v<< and v>> intrinsics for bitwise shifts on elements	2009-09-24 03:32:39 -05:00
Slava Pestov	3581d0b09b	cpu.x86/ppc: unify register-to-register moves using %copy so that better coalescing can eliminate more moves later	2009-09-23 22:49:54 -05:00
Slava Pestov	165496d2f2	Add longlong-2, ulonglong-2, longlong-4, ulonglong-4 SIMD types, fix int-4 multiplication on SSE2	2009-09-23 20:23:25 -05:00
Slava Pestov	abac963882	math.vectors.simd: new operations: vabs vsqrt vbitand vbitor vbitxor	2009-09-23 02:47:14 -05:00
Slava Pestov	e4872212b1	cpu.x86: fix using list	2009-09-20 23:24:30 -05:00
Slava Pestov	e04fba6bc7	Fix conflict	2009-09-20 23:18:07 -05:00
Slava Pestov	66871995c9	math.vectors.simd: add saturated arithmetic operations	2009-09-20 23:16:02 -05:00
Slava Pestov	78c949b9b7	math.vectors: add v+- word which is accelerated by SSE3	2009-09-20 17:43:16 -05:00
Slava Pestov	dfb43bd2ca	More integer SIMD work - move generated vocab support from specialized-arrays to vocabs.generated - add fuzz testing to math.vectors.simd - add alien type support for integer SIMD vectors - SIMD: parsing word generates a SIMD type, instead of pre-generating them all in math.vectors.simd	2009-09-20 16:48:17 -05:00
Slava Pestov	0d77efef29	cpu.x86: cleanup	2009-09-20 04:17:34 -05:00
Slava Pestov	fc5fe2bd2a	Merge Phil Dawes' VM work	2009-09-20 03:48:08 -05:00
Slava Pestov	ea2bcd69c7	math.vectors.simd: redesign to be more flexible, integer SIMD work in progress	2009-09-20 02:08:32 -05:00
Phil Dawes	f5e6d43e1e	separated vm-1st-arg and vm-3rd-arg asm invoke words (needed for ppc & x86.64)	2009-09-16 08:20:09 +01:00
Phil Dawes	6e5ddc0c33	vm pointer passed to nest_stacks and unnest_stacks (win32)	2009-09-16 08:17:26 +01:00
Phil Dawes	780415b159	added code to pass vm ptr to some unboxers	2009-09-16 08:16:32 +01:00
Phil Dawes	2a1a4ccf27	fixed up getenv compiler intrinsic to use vm struct userenv	2009-09-16 08:16:32 +01:00
Phil Dawes	cb3df86491	moved cards_offset and decks_offset into vm struct (for x86)	2009-09-16 08:16:31 +01:00
Phil Dawes	fd72e140d2	nursery global variable moved into vm	2009-09-16 08:16:31 +01:00
Phil Dawes	6da959ff3b	renamed to vm-field-offset. Slava's better at naming than me	2009-09-16 08:16:31 +01:00
Phil Dawes	77a13b1b6a	Added a vm C-STRUCT, using it for struct offsets in x86 asm	2009-09-16 08:16:31 +01:00
Phil Dawes	f9f1031dd8	moved stack_chain into vm struct	2009-09-16 08:16:31 +01:00
Phil Dawes	1fda8af73b	Added %vm-invoke to pass vm ptr to vm functions (x86.32 only, otherwise uses singleton vm)	2009-09-16 08:16:30 +01:00
Joe Groff	e5145b5a48	convert compiler cpu backends to use c-type words	2009-09-15 16:08:42 -05:00
Slava Pestov	19a5f58b53	cpu.x86: tweak SIMD intrinsics	2009-09-08 22:34:01 -05:00
Slava Pestov	092b31910d	compiler: separate ##save-context instruction from ##alien-invoke, generate a ##save-context for libm calls, and add a pass to combine multiple context saves within a basic block. Fixes crashes with FP traps thrown by libm functions on x86-32	2009-09-08 21:50:55 -05:00
Joe Groff	025a5b7b15	split unordered and ordered float comparison intrinsics in compiler; generate only unordered comparisons for now	2009-09-08 17:04:26 -05:00
Slava Pestov	ef09991500	Fixes	2009-09-08 00:13:18 -05:00
Slava Pestov	17821626c3	Fix conflicts	2009-09-07 23:51:25 -05:00
Joe Groff	9430fdc4b6	i had comisd/ucomisd backwards on x86	2009-09-04 12:30:30 -05:00
Slava Pestov	1f5193198b	compiler: clean up code generation for alien boxing/unboxing a bit	2009-09-03 21:22:43 -05:00
Joe Groff	b1ba82c84f	convert comparison branch code in compiler to use locals	2009-09-03 21:19:39 -05:00
Slava Pestov	20dfbf7ac8	More SIMD work - Rename SIMD types and register representations: <type>-<count> rather than <count><type>-array - Make a functor to define 256-bit vector types, use it to define float-8 type - Make SIMD instructions pure-insns so that they participate in value numbering	2009-09-03 20:58:56 -05:00
Joe Groff	0b9e5c034a	add compiler comparison codes for floating-point unordered comparisons; update x86 backend to generate proper code for all floating-point comparisons	2009-09-03 20:32:05 -05:00
Slava Pestov	f811208271	Detect SSE version and enable the correct set of SIMD intrinsics	2009-09-03 03:28:38 -05:00
Slava Pestov	52b99c050e	Initial implementation of SSE vector intrinsics: - cpu.architecture: add SSE vector representations - compiler.cfg.intrinsics.alien: remove an attempt at optimization that value numbering handles now - compiler.cfg.representations: support instructions where the representation is set in the 'rep' slot, and support conversions between single and double floats - alien-float, set-alien-float now use the single float representation, and the conversion is implicit; this fixes a long-standing bug where a register could get clobbered because of how %set-alien-float was defined on x86 - math.vectors.specialization: add support for SIMD specialization (where the vector word's body is replaced by another quotation), also specialize the 'sum' word - math.vectors.simd: 4float-array, 2double-array, 4double-array types, and specializers for the math.vectors words	2009-09-03 02:33:07 -05:00
Slava Pestov	775b9af2f7	compiler: eliminate boilerplate by centralizing info in declarative INSN: syntax	2009-09-02 06:22:37 -05:00
Slava Pestov	b35a01879e	%box-displaced-alien: fix clobberage found by Doug	2009-08-30 05:11:08 -05:00
Slava Pestov	f30aa5d20e	compiler: add fixnum-min/max intrinsics; ~10% speedup on benchmark.yuv-to-rgb	2009-08-28 19:02:59 -05:00
Slava Pestov	99bf9fadfb	Performance improvements to make struct-arrays benchmark faster - improved optimization of ##unbox-any-c-ptr on ##box-displaced-alien; convert it to ##unbox-c-ptr where possible using class info stored in the ##bda instruction - make fcos, fsin, etc inline again; everything in math.libm inline again, except for fsqrt which is an intrinsic - convert min and max on floats to float-min and float-max - make min and max not inline, so that the above can work - struct-arrays: rice a bit so that more fixnums come up	2009-08-28 05:21:16 -05:00
sheeple	8970cbc961	cpu.ppc: fix ##box-displaced-alien	2009-08-27 04:43:45 -05:00
Slava Pestov	9caf3f9248	compiler: new inline intrinsic for <displaced-alien> where the inputs have known types; value numbering now eliminates unnecessary allocation of displaced aliens if the result is immediately unboxed again	2009-08-27 00:06:19 -05:00
Slava Pestov	4fe0257169	cpu.x86: use SQRTSD instruction for math.libm:fsqrt word	2009-08-25 23:22:15 -05:00
Doug Coleman	d1ce837569	Delete empty unit tests files, remove 1- and 1+, reorder IN: lines in a lot of places, minor refactoring	2009-08-13 19:21:44 -05:00
Slava Pestov	4d2160799f	Split off the notion of a register representation from a register class	2009-08-07 17:44:50 -05:00
Slava Pestov	db55a031df	Move a bunch of GC check generation logic to platform-independent side	2009-07-30 21:28:27 -05:00
Slava Pestov	99216b8435	compiler.cfg: Get inline GC checks working again, using a dataflow analysis to compute uninitialized stack locations in compiler.cfg.stacks.uninitialized. Re-enable intrinsics which use inline allocation	2009-07-30 09:19:44 -05:00
Slava Pestov	c9feb6f012	cpu.x86: Fix shuffle bug. Shuffling bugs occurring in code that runs before optimizer/stack checker is online are only caught at runtime during bootstrap, what a pain	2009-07-30 05:12:40 -05:00
Slava Pestov	4842641e75	cpu.x86: fix a bug in small-register logic on 32-bit. Also, on 32-bit, we don't need to do any special register shuffling to work with 16-bit operands since all registers have 16-bit variants. So now only 8-bit operands on x86-32 require special treatment	2009-07-30 05:04:46 -05:00
Slava Pestov	0899934220	cpu.x86: use full set of 8-bit, 16-bit and 32-bit registers on x86-64 to avoid clumsy save/restore logic	2009-07-29 21:56:37 -05:00
Slava Pestov	7831293fda	cpu.x86.assembler: move operands to operands sub-vocabulary, clean up small-reg-* code in compiler backend	2009-07-29 21:44:08 -05:00
Slava Pestov	f0a5ac3fbb	cpu.x86: compile a load of zero, and adds, subs where dst = src1 more efficiently	2009-07-27 22:27:54 -05:00
Slava Pestov	39a70db831	Improve code generation for shift word: add intrinsics for fixnum-shift-fast in the case where the shift count is not constant, transform 1 swap shift into a more overflow check with open-coded fast case, transform bitand into fixnum-bitand in more cases	2009-07-16 23:50:48 -05:00
Slava Pestov	99faf3c79f	Overflowing fixnum intrinsics now expand into several CFG nodes. This speeds up the common case since only the uncommon case is now a stack syncpoint	2009-07-16 18:29:40 -05:00
Slava Pestov	1eae4286cd	compiler.cfg: split off condition codes into a comparisons sub-vocabulary	2009-07-13 14:42:52 -05:00
Slava Pestov	a61a992bfd	cpu.x86.assembler: IMUL2 instruction was busted for immediate operands When given a register and an immediate, it would generate imul imm,dst,dst however the 64-bit prefix was generated wrong and if dst was an extended register only the first operand would be an extended register. To fix this, change IMUL2 to not work on immediates anymore, and added a new IMUL3 that takes a destination register, source register, and immediate. Also, change compiler.cfg.two-operand to not two-operandize %mul-imm, since this isn't needed anymore. This fixes the sporadic benchmark.tuple-arrays crash on 64-bit machines.	2009-06-08 21:15:52 -05:00
Slava Pestov	0d265fe016	Remove %dispatch-label since its tehe same on all platforms; fix %gc on PowerPC	2009-06-07 21:46:28 -05:00
Slava Pestov	fd710385e5	cpu.x86: fix small register intrinsics on x86-64	2009-06-03 03:22:46 -05:00
Slava Pestov	7aca076408	GC checks now save and restore registers	2009-06-02 18:23:47 -05:00
Slava Pestov	096803e58f	Redo compiler.codegen.fixup and get %dispatch to work	2009-06-01 02:32:36 -05:00
Slava Pestov	64114947d2	Various improvements aimed at getting local optimization regressions fixed: - Rename _gc to ##gc - Absolute labels are now supported - Generate _dispatch-label	2009-05-31 23:28:08 -05:00
Slava Pestov	40949800bf	Fixing various bugs; alias analysis wasn't handling ##phi nodes, stack analysis incorrectly handled height-changing back edges and ##fixnum-*, clean up ##dispatch generation	2009-05-29 01:39:14 -05:00
Slava Pestov	74094142fe	Fix tail call PICs on x86-64	2009-05-06 22:44:30 -05:00
Slava Pestov	d3b85c14c9	Working on inline caching for tail call sites	2009-05-06 19:22:22 -05:00
Slava Pestov	478d29a175	Better separation of concerns: cpu.{x86,ppc}.assembler no longer depends on compiler.codegen.fixup and cpu.architecture. Rename rt-xt-direct to rt-xt-pic to better explain its purpose	2009-05-06 16:14:53 -05:00
Slava Pestov	44bfff7c7b	Rename ##load-indirect to ##load-reference since this is more descriptive; value numbering doesn't assign expressions to ##load-reference nodes since this would end up folding literals which were eq? but not =	2009-01-29 01:44:58 -06:00
Slava Pestov	8a8f0c925c	Use BSR instruction to implement fixnum-log2 intrinsic	2008-12-06 15:31:17 -06:00
Slava Pestov	a56d480aa6	Various optimizations leading to a 10% speedup on compiling empty EBNF parser: - open-code getenv primitive - inline tuple predicates in finalization - faster partial dispatch - faster built-in type predicates - faster tuple predicates - faster lo-tag dispatch - compile V{ } clone and H{ } clone more efficiently - add fixnum fast-path to =; avoid indirect branch if two fixnums not eq - faster >alist on hashtables	2008-12-06 09:16:29 -06:00
Slava Pestov	82cf6530c6	set-string-nth-fast intrinsic was busted	2008-12-05 23:52:09 -06:00
Slava Pestov	e256846acd	Tweak string representation; high bit indicates if character has high bits in aux vector. Avoids memory access in common case. Split set-string-nth into two primitives; set-string-nth-fast is open-coded by optimizing compiler. 13% improvement on reverse-complement	2008-12-05 06:38:51 -06:00
Slava Pestov	e5ed7447ed	Removing more >r/r> usages	2008-12-03 08:46:16 -06:00
Slava Pestov	e7f4563374	fixnum* intrinsic for x86	2008-11-30 07:26:49 -06:00
Slava Pestov	f44506089d	More work on overflow instructions: don't need temp register anymore, add -tail variants which don't need stack frame	2008-11-28 06:36:30 -06:00
Slava Pestov	5634becda1	##fixnum-add, ##fixnum-sub instructions open-code overflow check	2008-11-28 05:33:58 -06:00
Slava Pestov	ab689c098b	Clean up direct literal code and make a first attempt at PowerPC support	2008-11-24 08:16:14 -06:00
Slava Pestov	2aaf860f47	Experimental optimizations	2008-11-24 06:40:51 -06:00
Slava Pestov	20f5541d35	Refactoring FFI for Win64	2008-11-17 13:34:37 -06:00
Slava Pestov	eb05dd3a12	Optimize a ##dispatch that is applied to the result of a ##sub-imm or ##add-imm; this eliminates an instruction from the common 1 fixnum-fast { ... } dispatch and 8 fixnum-fast { ... } dispatch code sequences appearing in generic word expansions	2008-11-13 04:16:08 -06:00
unknown	f7fe84e563	Working on Win64 FFI	2008-11-08 21:40:47 -06:00
unknown	7365959f01	Starting work on Win64 port	2008-11-07 20:33:32 -06:00
sheeple	d2ec46e38f	PowerPC backend almost functional; some new compiler unit tests added, better compilation of 'f eq?'; f becomes an immediate operand move aux-offset to compiler.constants	2008-11-06 06:27:27 -06:00
Slava Pestov	53cd75b06c	Add string-nth intrinsic	2008-11-06 01:11:28 -06:00
Slava Pestov	8b7c47a68b	Clean up x86 backend: move cpu.x86.architecture to cpu.x86, use branchless arithmetic in some intrinsics	2008-11-05 04:15:48 -06:00

1 2 3 4 5

230 Commits (ed5cea57eae7412cbd52129ee757f21c89b9fb9c)