ewx: (geek)
[personal profile] ewx
#if __GNUC__ && __i386__
/* This is not entirely satisfactory: the xchgl would be unnecessary, if only
 * we had some way of communicating the detailed input and output assignments
 * of the registers to the compiler. */
#define BSWAP64(N)                                      \
({uint64_t __n = (N); __asm__("xchgl %%eax,%%edx\n"     \
                              "\tbswap %%eax\n"         \
                              "\tbswap %%edx"           \
                              : "+A"(__n));             \
  __n;})
#endif

The trouble is, the result ends up looking something like this:

        movl    0x04(%eax),%edx
        movl    (%eax),%eax
        xchgl   %edx,%eax
        bswap   %eax
        bswap   %edx
        movl    0x0c(%ebp),%esi
        movl    %eax,(%esi)
        movl    %edx,0x04(%esi)

…when obviously it would be better to:

        movl    0x04(%eax),%edx
        movl    (%eax),%eax
        bswap   %eax
        bswap   %edx
        movl    0x0c(%ebp),%esi
        movl    %edx,(%esi)
        movl    %eax,0x04(%esi)

However as far as I can see A is the only available constraint letter for 64-bit values on x86.

(no subject)

Date: 2011-08-27 11:49 am (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
Can you get the registers to be allocated separately, along the lines of
    uint32 ah = a >> 32;
    uint32 al = (uint32)a;
    BSWAP32(ah);
    BSWAP32(al);
    a = (al << 32) | ah;
where BSWAP32 expands to an __asm__ statement encoding a single bswap instruction on an arbitrary register. With any luck the compiler will optimise the shifts-by-32 into 'just access the top register of the two allocated to this 64-bit value', and will then be able to register-allocate the bswaps however it turns out easiest.

(no subject)

Date: 2011-08-27 12:11 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
Oh yes, that’s much better. Ta. I’m still academically interested in the general case of 64-bit values and __asm__ but my immediate problem is solved.

(no subject)

Date: 2011-08-28 09:08 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
I expect the theory is that you almost never need to specify a 64-bit quantity being passed in or out of an asm block in one go – in general you can use the above technique to break 64-bit things up and pass in the two halves separately, which is better anyway because then the compiler can allocate the two registers independently. The A specifier is intended for use in cases where the CPU operates on the whole 64-bit value in a single instruction, and when x86 does that (MUL, DIV, CMPXCHG8B) it always expects or returns the value in EDX:EAX, so you never want to allocate the value anywhere else.

(no subject)

Date: 2011-08-27 12:27 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
glibc's implementation of bswap_64() for x86-32 (in bits/byteswap.h) achieves this by type-punning through a union:
#  define __bswap_64(x) \
     (__extension__                                                           \
      ({ union { __extension__ unsigned long long int __ll;                   \
                 unsigned int __l[2]; } __w, __r;                             \
         if (__builtin_constant_p (x))                                        \
           __r.__ll = __bswap_constant_64 (x);                                \
         else                                                                 \
           {                                                                  \
             __w.__ll = (x);                                                  \
             __r.__l[0] = __bswap_32 (__w.__l[1]);                            \
             __r.__l[1] = __bswap_32 (__w.__l[0]);                            \
           }                                                                  \
         __r.__ll; }))

(no subject)

Date: 2011-08-27 12:37 pm (UTC)
gerald_duck: (quack)
From: [personal profile] gerald_duck
So the real answer is "#include <byteswap.h> then use bswap_64 because the library writers have already solved this problem"? (-8

(no subject)

Date: 2011-08-27 12:38 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
On Glibc platforms, yes, and I was already doing that…

(no subject)

Date: 2011-08-27 12:48 pm (UTC)
gerald_duck: (by Redderz)
From: [personal profile] gerald_duck
Ah. Brainfart. I see __GNUC__ and I assume glibc will be available.

Then again, given that part of glibc is headers-only with no object code and, given it's LGPL, why not just use glibc?

(no subject)

Date: 2011-08-28 09:21 pm (UTC)
cjwatson: (Default)
From: [personal profile] cjwatson
With GCC 4.3 and up, you can use __builtin_bswap64 instead.

(no subject)

Date: 2011-08-27 12:33 pm (UTC)
gerald_duck: (ascii)
From: [personal profile] gerald_duck
This is 32-bit x86 and 64-bit values live on the stack, yes?

I like Simon's idea, but might an alternative be to have __asm__ give you the address of __n, tell it you're going to trash %eax and %edx then do the stack manipulation yourself?

I would provide a worked example, but I'm pretty rusty on __asm__, especially in x86-land rather than ARM-land.

(no subject)

Date: 2011-08-27 05:00 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
might an alternative be to have __asm__ give you the address of __n, tell it you're going to trash %eax and %edx then do the stack manipulation yourself?
This is typically a bad idea because it will force gcc to spill the operands to memory so you can reload them. (Also it means you probably end up using a suboptimal separate address calculation and load instruction). When the asm you're trying to emit is really just doing register operations it's much nicer to expose that to gcc's register allocator where you can...

(no subject)

Date: 2011-08-28 01:45 am (UTC)
gerald_duck: (frontal)
From: [personal profile] gerald_duck
Well, yes. But in the example given there, it's bringing the operand in from memory and stashing the result to memory anyway.

I was assuming this was because the compiler couldn't cope with register tracking for 64-bit values?

(no subject)

Date: 2011-08-28 08:53 am (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
The compiler is happy to keep 64-bit temporaries in pairs of 32-bit registers; the example just happens to be load from memory, swap, store to (different) memory.

November 2025

S M T W T F S
      1
2345678
91011121314 15
1617 181920 2122
23242526272829
30      

Most Popular Tags

Expand Cut Tags

No cut tags