[Dxspider-support] Joke or bug?

Dirk Koopman djk at tobit.co.uk
Mon Mar 7 22:16:02 GMT 2005


On Mon, 2005-03-07 at 11:22 -0800, Lee Sawkins wrote: 
> I have a Dx Spider cluster and an AR cluster running in the same
> computer.  Doing an SH/C in either cluster will give you incorrect data
> for the other cluster. 
> 
> Usually calls connecting are distributed correctly to the connected
> clusters, but not so with the disconnecting calls.  The connected user
> lists tend to gradually grow in size. 

SH/C will give incorrect data. Period.

It is actually an artifact of the protocol, loops, selective filtering
and, yes, bugs in software (maybe even mine).

I will try and explain this but it is rather difficult to get one's head
around. 

The executive summary is this: 

PC Protocol for routing (PC16/17/19/21) is broken because nodes store
and then redistribute *their* (current) version of the routing tree
every time another node connects. Because there is no ordering or time
information on these protocol sentences, what you end up with is:
worldwide chinese whispers - with never decaying loops. 

Even on the internet, there is a significant time lag between an event
(a user connecting or disconnected, ditto for nodes) that the
information from (say) a node disconnection getting from one end of the
cluster to another.  

Because there are loops, these events intermingle and get mixed up.  It
is impossible to determine when (say) a PC17 was sent or whether that
comes before or after the PC16 that has also arrived for the same
user/node. Also, in the meantime a couple of nodes have reconnected,
they have not had the PC17 yet, so they issue the PC16s that they know
about and *they* then get sent around the network.

The result seems to be that user and node lists have become completely
unreliable. I have made various attempts to shore this up. I have
frankly failed. 

As a result, after years of talking about it, having done several
abortive experiments, I am finally (slowly) undertaking the gradual
rewrite that DXSpider requires to sort this out, whilst retaining some
compatibility with PC protocol. This means a new protocol where the
routing information is distributed on a different basis. 

That basis is: that nodes are responsible for maintaining, and
periodically broadcasting, their own configuration. The concept of the
huge node lists on connection will disappear (at least for the new
protocol connections). 

In addition DXSpider nodes, whilst retaining compatibility with existing
nodes, will act as proxy to only those nodes that are directly
connected. So only the user configurations of those, directly connected,
nodes will be shown in SH/C inside the new DXSpider (or Aranea protocol)
connected nodes. This is because, the only thing you can trust from a PC
protocol node, is that which concerns it directly. And, depending on
software, you can't even completely rely on that (because some software
"believes" things about its own connections that is sent to it from
outside).  

On those nodes that are proxying, the full PC protocol view will
continue to be seen, but with the caveat that the information shown will
not be 'believed' nor will it be propagated. Also any nodes that are
marked as being Aranea nodes or are acting as proxies for directly
connected PC protocol nodes: their information will override anything
that comes in from non-directly connected remote PC protocol nodes. 

Because the routing information that is broadcast comes from the horse's
mouth (the originating node/proxy) only. This seems, even on limited
experimentation so far, to make the configurations converge to something
that approaches reality.  

If anybody is interested, there is a (now very slightly out of date)
paper available here: http://www.dxcluster.org/tech/protocol.html

It is actively being tested on GB7DJK-1, GB7TLH, WR3D and GB7MBC-1.

Dirk G1TLH




More information about the Dxspider-support mailing list