ewx | What I’ve been playing with

You're viewing

ewx's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

(Click on the image for a full-size version.)

I first encountered this stuff in my teens, via computer magazines, a friend with shared enthusiasms, and James Gleick’s book Chaos. My first attempt was in BASIC on a Sinclair QL, and very slow; it was nowhere near finished even the following morning. I had a lot of fun creating a faster version in 68K assembler using fixed point arithmetic (16 integer bits, rather more than required, and 32 fractional bits, as I recall).

Since Benoit Mandelbrot’s recent death there’s been a lot of references to it online, and that inspired me to revisit it this weekend. The visible results are above, and my program (still very much a work in progress!) can be retrieved using:

git clone http://www.greenend.org.uk/rjk/git/mand

Read the README. You will need a Linux or OS X system with the GTK+ libraries installed.

Flat | Top-Level Comments Only

From:

wellinghall.livejournal.com

Pretty! :-)

From:

aardvark179.livejournal.com

What I found interesting (having been inspired by BM's death to mess around with the same things) is how quickly you can reach the point where the resolution of floating point numbers becomes obvious. When I was last playing around with this stuff it took hours or days to generate pictures deep enough to exhibit those problems, now you can do it fairly quickly.

Looking around it seems that most people have concentrated more on interesting renderings recently rather than simple deep exploration, there's some nice examples of plotting the orbits of large numbers of random points (http://erleuchtet.org/2010/07/ridiculously-large-buddhabrot.html), and just how much that can be accelerated with a modern GPU.

From:

mobbsy.livejournal.com

Just glancing at the code (not having yet compiled it), you're calculating zx² and zy² twice per iteration, once for the loop test, once for the calculation. I suspect there's a measurable win by only doing that once per iteration.

...OK, I got bored and dumped the assembler, it saves an instruction overall, and you save another instruction in the main loop by reordering the test to have the iterations after the >4 test.

Depending on how the instructions pack in the pipeline, that could be 10% speedup.

Originally GCC 4.5.1 on a Core 2 produced:

L64:
movapd %xmm3, %xmm1
addsd %xmm4, %xmm4
subsd %xmm2, %xmm1
mulsd %xmm4, %xmm0
addsd %xmm6, %xmm1
addsd %xmm7, %xmm0
movapd %xmm1, %xmm3
movapd %xmm0, %xmm2
mulsd %xmm1, %xmm3
mulsd %xmm0, %xmm2
movapd %xmm3, %xmm4
movsd LC6-"L00000000002$pb"(%ebx), %xmm5
addsd %xmm2, %xmm4
ucomisd %xmm4, %xmm5
jbe L42
movapd %xmm1, %xmm4
L43:
addl $1, %eax
cmpl %eax, %edx
jg L64

After the tweaks above (and compiling with -march=native, which just changed the addl $1,%eax to an incl %eax), it produced:

L65:
cmpl %eax, %edx
jle L43
movapd %xmm2, %xmm1
L44:
movapd %xmm1, %xmm4
movapd %xmm0, %xmm3
mulsd %xmm1, %xmm4
mulsd %xmm0, %xmm3
movapd %xmm4, %xmm2
addsd %xmm1, %xmm1
subsd %xmm3, %xmm2
mulsd %xmm1, %xmm0
incl %eax
addsd %xmm4, %xmm3
addsd %xmm5, %xmm2
addsd %xmm6, %xmm0
ucomisd %xmm3, %xmm7
ja L65

From:

ewx.livejournal.com

I’ve rearranged as suggested (hopefuly, I’m not used to getting patches in assembler form only l-). It didn’t seem to make much difference to the .s on Apple’s GCC (which rejects -march=nativeincl anyway). The code was more visibly changed on Debian lenny’s GCC. I’d been hoping the compiler would spot it could do the squaring only once!

From:

mobbsy.livejournal.com

I suppose the C might have been helpful. :-)

Like several others, your post just caused me to remember playing around with Mandelbrot sets, and how it was a problem that really showed the benefits of inner loop optimisation. (I don't know enough SSE to know if the latter is the best solution).

From:

ewx.livejournal.com

Mmm. I was wondering if any of the vector operations could be used to speed things up (perhaps in association with a switch to fixed point, e.g. to address the precision issues the Aardark refers to above).

From:

mobbsy.livejournal.com

http://www.mikusite.de/pages/x86.htm seems to be somebody's obsessivly hand-optimized SSE2 assembler for Mandelbrot generation with loop unrolling, using packed dwords for multiple points per instruction and so on.

They claim 5.2 billion iterations (of z_n+1 = z_n² + c) per second on a modern 3.2GHz quad core CPU, which seems rather impressive.

From:

twigletzone.livejournal.com

Hah. I remember those. I used to have some awesome fractal posters in my bedroom.

From:

imc.livejournal.com

Heh. I started off in assembler on a Spectrum +3, running it overnight and then viewing the results. Can't honestly remember if it generated anything worth looking at.

Then there was: http://www.ioccc.org/years.html#1992_imc

We had monochrome sparcs back in those days. It needs big-endian architecture to output a valid .ras file, though the fix is not that difficult.

From:

dave holland (from livejournal.com)

Prettier than the last one I wrote :-)

http://www.biff.org.uk/dave/95.html

From:

simont

From my university days I particularly remember

fanf's striking black and white Mandelbrot plots (convenient for printing out at very high resolution on b&w laser printers), which coloured points outside the set not by the number of iterations they took to escape, but by the sign of their imaginary part (or possibly real part, I forget which) just after they did escape. The effect was that that of an endlessly subdividing filigree looking something like