[Dxspider-support] Cluster hangs

Dirk Koopman djk at tobit.co.uk
Sun Mar 6 14:38:48 GMT 2005


On Sat, 2005-03-05 at 16:29 -0700, Kelly Jones wrote:
> Hi guys,
> 
> I seem to have an issue somewhere.  For whatever reason, DX Spider just 
> seems to 'hang' for no apparent reason.  I noticed this during the ARRL CW 
> contest a couple of weeks ago and it appears to be doing it again during 
> the phone contest.

Firstly, can I say (for what it is worth): I have not seen this
behaviour. But see below...


> If it tail the .dat file in the /spider/data/debug/2005/xxx.dat, there is 
> no activity being written to this file when the cluster is 
> 'hung'.  However, once it 'lets go', all of the activity is written at once.
> 

It would be very interested in (say) the 50 lines directly about such a
"hang". It is important that it really is the data above the hang. The
first few afterwards might be good as well.

> If I run a 'top', there appears to be almost 0 cluster activity and the box 
> is running at 99% idle.  Currently running v1.51 build 58.323.
> 

This is a (possible) clue. Is it possible that someone is doing one of
the commands (such as sh/qrz, sh/425 etc) that go and do an http query
to an external http server? 

DX Spider does this in a rather crude way at the moment and it has been
known in the past, when the target webservers are overloaded or
otherwise not working right, that the node will hang waiting for some
kind of reply from the webserver. It should time out. But I am not
completely convinced it is completely reliable.

Some time ago, I thought I had put in asyncronous connects (to start up
links to external nodes), looking at this bit of the code for the first
time in quite a while, the asyncronous part seems to have disappeared. 

This too can cause problems because it can cause the node to hang
waiting for the connect to complete. The connect will go into the
operating system and stay there for considerable periods of time
(several minutes) if there is absolutely no reply (of any kind) to the
outgoing TCP Syn packets (the 'connect' packets).

There was a spate of this several years ago and I thought I had fixed
it. Apparently this might not be true...

Dirk G1TLH 




More information about the Dxspider-support mailing list