ewx | Is it possible to do any better?

You're viewing

ewx's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

#if __GNUC__ && __i386__
/* This is not entirely satisfactory: the xchgl would be unnecessary, if only
 * we had some way of communicating the detailed input and output assignments
 * of the registers to the compiler. */
#define BSWAP64(N)                                      \
({uint64_t __n = (N); __asm__("xchgl %%eax,%%edx\n"     \
                              "\tbswap %%eax\n"         \
                              "\tbswap %%edx"           \
                              : "+A"(__n));             \
  __n;})
#endif

The trouble is, the result ends up looking something like this:

        movl    0x04(%eax),%edx
        movl    (%eax),%eax
        xchgl   %edx,%eax
        bswap   %eax
        bswap   %edx
        movl    0x0c(%ebp),%esi
        movl    %eax,(%esi)
        movl    %edx,0x04(%esi)

…when obviously it would be better to:

        movl    0x04(%eax),%edx
        movl    (%eax),%eax
        bswap   %eax
        bswap   %edx
        movl    0x0c(%ebp),%esi
        movl    %edx,(%esi)
        movl    %eax,0x04(%esi)

However as far as I can see A is the only available constraint letter for 64-bit values on x86.

Flat | Top-Level Comments Only

From:

simont

Can you get the registers to be allocated separately, along the lines of

    uint32 ah = a >> 32;
    uint32 al = (uint32)a;
    BSWAP32(ah);
    BSWAP32(al);
    a = (al << 32) | ah;

where BSWAP32 expands to an __asm__ statement encoding a single bswap instruction on an arbitrary register. With any luck the compiler will optimise the shifts-by-32 into 'just access the top register of the two allocated to this 64-bit value', and will then be able to register-allocate the bswaps however it turns out easiest.

From:

ewx.livejournal.com

Oh yes, that’s much better. Ta. I’m still academically interested in the general case of 64-bit values and __asm__ but my immediate problem is solved.

From:

simont

I expect the theory is that you almost never need to specify a 64-bit quantity being passed in or out of an asm block in one go – in general you can use the above technique to break 64-bit things up and pass in the two halves separately, which is better anyway because then the compiler can allocate the two registers independently. The A specifier is intended for use in cases where the CPU operates on the whole 64-bit value in a single instruction, and when x86 does that (MUL, DIV, CMPXCHG8B) it always expects or returns the value in EDX:EAX, so you never want to allocate the value anywhere else.

From:

pm215

glibc's implementation of bswap_64() for x86-32 (in bits/byteswap.h) achieves this by type-punning through a union:

#  define __bswap_64(x) \
     (__extension__                                                           \
      ({ union { __extension__ unsigned long long int __ll;                   \
                 unsigned int __l[2]; } __w, __r;                             \
         if (__builtin_constant_p (x))                                        \
           __r.__ll = __bswap_constant_64 (x);                                \
         else                                                                 \
           {                                                                  \
             __w.__ll = (x);                                                  \
             __r.__l[0] = __bswap_32 (__w.__l[1]);                            \
             __r.__l[1] = __bswap_32 (__w.__l[0]);                            \
           }                                                                  \
         __r.__ll; }))

From:

gerald_duck

So the real answer is "#include <byteswap.h> then use bswap_64 because the library writers have already solved this problem"? (-8

From:

ewx.livejournal.com

On Glibc platforms, yes, and I was already doing that…

From:

gerald_duck

Ah. Brainfart. I see __GNUC__ and I assume glibc will be available.

Then again, given that part of glibc is headers-only with no object code and, given it's LGPL, why not just use glibc?

From:

cjwatson

With GCC 4.3 and up, you can use __builtin_bswap64 instead.

From:

gerald_duck

This is 32-bit x86 and 64-bit values live on the stack, yes?

I like Simon's idea, but might an alternative be to have __asm__ give you the address of __n, tell it you're going to trash %eax and %edx then do the stack manipulation yourself?

I would provide a worked example, but I'm pretty rusty on __asm__, especially in x86-land rather than ARM-land.

From:

pm215

might an alternative be to have __asm__ give you the address of __n, tell it you're going to trash %eax and %edx then do the stack manipulation yourself?

This is typically a bad idea because it will force gcc to spill the operands to memory so you can reload them. (Also it means you probably end up using a suboptimal separate address calculation and load instruction). When the asm you're trying to emit is really just doing register operations it's much nicer to expose that to gcc's register allocator where you can...

From:

gerald_duck

Well, yes. But in the example given there, it's bringing the operand in from memory and stashing the result to memory anyway.

I was assuming this was because the compiler couldn't cope with register tracking for 64-bit values?

From:

ewx.livejournal.com

The compiler is happy to keep 64-bit temporaries in pairs of 32-bit registers; the example just happens to be load from memory, swap, store to (different) memory.

Flat | Top-Level Comments Only

Profile

Richard Kettlewell

https://www.greenend.org.uk/rjk/

January 2026

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Active Entries

Expand Cut Tags

No cut tags

Top of page

Richard Kettlewell

Is it possible to do any better?

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

January 2026

Most Popular Tags

Active Entries

Expand Cut Tags