ewx | DisOrder and network play

Didn't make it to Karate this evening: too tired, too much rain.

On the other hand, I now have a reasonably detailed plan for how to add some new features to DisOrder: in particular the ability to pause tracks and to mostly eliminate the annoying gap between tracks. The changed design should support other changes I want to make, i.e. completely eliminating the gap and a network play facility.

The network play one needs the most thought. If you have multiple players in one house - as I expect to - then you really want them as closely in sync with each other as possible, as otherwise things will sound horrible when you're in range of them both. Even if they are too distant to hear at once, you don't really want players to drift apart more than a small amount.

My idea so far is for the server to calculate a nominal target time for each frame to be played, calculated in terms of the start time, the sample rate and the number of samples so far transmitted. (Start time means the start of continuous play, not just the start of the most recent track.) This would be included with the sample data of the frame. The endpoint player then has an adjustment (not necessarily positive) which they add to this to compute the time to play that frame according to their own clock.

The adjustment effectively incorporates the latency of the network, the latency of the local sound software and hardware, the skew between the different clocks involved, and the distance between nearby players. (You want everything's clock to be basically right, but you have to expect a bit of variation between them, though hopefuly it's a small fraction of a second.) The idea is that you manually tune this value until things don't sound too bad where you can hear both.

What happens when a frame is to be played “in the past”? That could be down to the user: it could just be dropped, if synchronization was more important than the occasional glitch due to network congestion; or things could just be played late, catching up when the user decided to. A further alternative would be drop e.g. every 100th sample until we'd caught up.

(The simpler alternative, of just having a tunable delay between receipt and play for each player, has the problem that once a player gets behind it'll stay there indefinitely.)

Flat | Top-Level Comments Only

From:

pjc50.livejournal.com

Could you wire up the microphones of each player, then run an autocorrelation to determine how far out of sync with the other audio sources it had got?

Otherwise I think you can assume that over house ethernet latency is in the milliseconds and identical to all endpoints, so all you have to do is ping an endpoint and divide its time by two, then impose that delay on local playback.

ewx.livejournal.com

Ethernet ping times here are 0.1-0.3ms, over wireless it goes up to a bit over 2ms. I don't know if a couple of milliseconds are enough to hear the difference. The problem of staying out of sync once you get out is a bigger motivation for the more complex scheme.

timeplease.livejournal.com

The network play one needs the most thought.

Indeed. Your biggest problem is that the sound cards in each machine will have very slightly different clock frequencies, so will naturally drift apart from each other. The job of the server software is essentially time base generation; the job of the client software is queue management such that the audio that is being played is not too far away from the time base (essentially clock regeneration, except through resampling rather than generating a real clock).

The fact that your time base is being broadcast over a network means that there will be some problems: the further away in network terms the client is, the more jitter it will perceive. With any luck, though, the long term average will be useful for tuning audio clocks.

Clients could also ignore the time base and just look at the received audio packets: how full is the queue? If it's above a threshold, increase the audio clock (or start dropping samples); if it's below, decrease the clock or start duplicating samples.

Broadcast digital audio (digital radio / tv) has the nice property that the audio clock can be recovered from the broadcast signal using a simple PLL. Assuming your network packet frequency isn't too low (consider the number of samples encoded in a packet; and are you sending VBR compressed audio?) you may be able to do a similar trick.

My current model has server doing the decoding (of ogg, mp3, and maybe other things) and transmitting more-or-less raw sample data to the clients.

In that case, a few simple suggestions:

1) UDP, fixed number of samples per packet
2) Don't buffer in the server; let each client manage its own buffer
3) Perhaps you've found a use for IP broadcast / multicast!
4) In clients where you can't tweak the sound card clock, be prepared to resample (the nice way) or drop/duplicate samples when you notice the sound card FIFO getting full/empty.

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Richard Kettlewell

DisOrder and network play

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

February 2025

Most Popular Tags

Active Entries

Expand Cut Tags