I’ve not written a nontrivial amount of assembler in over 15 years, but have over the past few days started doing some for a future version of my Mandelbrot set program.
A few random thoughts:
- I did most of my assembler programming on the 68K, so the AT&T syntax preferred by
gas (i.e. op src,dest) seems natural to me. But the AMD documentation uses
the Intel syntax and turning everything back to front is mindbending, so I quickly
gave up and switched to that.
Fortunately there’s a .intel_syntax directive, and I did some Z80 long ago, so the Intel order isn’t wholely alien. - The MUL instruction’s a bit lame, isn’t it? The 68K could multiply any pair of registers (or in fact memory, for the source) and store the result in any register. Intel/AMD’s MUL fixes the destination and one of the inputs.
- I was disappointed to find that there doesn’t seem to be a high precision integer multiply (128*128->256 would be just the ticket) in the SSEn instructions.
(no subject)
Date: 2010-11-10 08:32 am (UTC)(no subject)
Date: 2010-11-10 09:17 am (UTC)MULinstruction is indeed feeble, but if I remember rightly there's a more recent (i.e. after the original 8086)IMULwhich makes up for its shortcomings and has a sensibly diverse set of source and destination options. If you're targeting x86-64 only, you ought to be able to use that with a clear conscience.(no subject)
Date: 2010-11-10 09:33 am (UTC)(no subject)
Date: 2010-11-10 09:48 am (UTC)(no subject)
Date: 2010-11-10 09:54 am (UTC)In x86 or ARM, with dest,src order, it's obvious how the compare instruction works with the subsequent conditional branch. If you CMP x,y and then branch-if-greater, it's clear that the 'greater' gets mentally inserted between the two compare operands, so you're branching if x > y. But on 68k, they flip the compare instruction's operands but don't flip the names of the branch conditions, so you always have to remember that comparing x,y followed by branch-if-greater means that you're branching if x is less than y.
Of course it makes sense if you're thinking of CMP as a trial subtraction, and in really complicated cases where you're abusing the condition codes to do fun things, that's the only way you can think of it. But for normal workaday code that isn't doing anything exciting, you really don't want to put your brain into that mode every time; you just want to say to yourself "now check if x > y and branch somewhere else if so", and then you want to translate that thought into a CMP and conditional branch in a basically trivial and mechanical manner, and mentally inverting the condition every time I did that was something I never got used to on 68k.
(no subject)
Date: 2010-11-10 01:53 pm (UTC)(no subject)
Date: 2010-11-10 10:57 am (UTC)Hence SSE's shallow vectorisation: several narrow multiplies in parallel rather than wide multiplies. I'm guessing you're expected to parallelise your algorithm a smidgen — doing as much in parallel as possible without reaching divergent decision logic — so you can exploit the ability to do two or four narrow integer multiplies simultaneously.
Given you're playing with a Mandelbrot set, a prime candidate might be manipulating real and imaginary portions of a number in parallel?
(Or am I teaching my grandmother to suck eggs, here…)
(no subject)
Date: 2010-11-10 02:08 pm (UTC)PMULUDQ (two parallel 32x32->64) looks like the best available for integer work; not very compelling when I already have a 64x64->128. For floating point the situation is somewhat better but that's not what I'm after right now.
(no subject)
Date: 2010-11-10 12:04 pm (UTC)(no subject)
Date: 2010-11-10 06:47 pm (UTC)(Having said that, amd64 has been allocated to someone who left three years ago.)