[Dxspider-support] node hangs

Mike McCarthy, W1NR lists at w1nr.net
Wed Nov 1 02:40:05 CET 2006


 
Hi Dirk,
I saw the aftermath of the hang twice over the weekend.  Both times I saw
the cluster come back to life with a large burst of spots.  There were
problems with my main feed, K1EA, which did seem to crash on more than one
occasion.  I can't say with any certainty that it corresponded to the time
K1EA went down.

The message that mentioned the filled SendQ in netstat jogs my memory.  I
saw this a long time ago.  It was close to 64K bytes pending.  Sorry I
didn't notice the hang in time to check netstat.

Could this be a perl Net-Telnet bug?  All socket calls should be set
non-blocking and should return an error if any write would cause it to block
(like out of buffers).  I still feel this is the cause.  Net-Telnet may not
be handling the condition properly.  Tracking it down may be quite a chore.
It may be we only see it when there is a node or user connection failure and
sufficient traffic to fill all the buffers before the connection times out
and is disconnected.  Perhaps a network of several test nodes with simulated
traffic and node failures might be able to reproduce it.  I do have a spare
PC and IP address I could put on-line for any test like this.

Mike, W1NR

-----Original Message-----
From: dxspider-support-bounces at dxcluster.org
[mailto:dxspider-support-bounces at dxcluster.org] On Behalf Of Dirk Koopman
Sent: Tuesday, October 31, 2006 9:15 AM
To: The DXSpider Support list
Subject: [Dxspider-support] node hangs

I have had a couple of reports of node hangs during the contest (again).

Has anybody got any evidence, in the form of debug files and some
accompanying narrative, that they would care share with me?

The current hypothesis is that it is caused by an link not accepting data
and the output buffer, on that link, filling up and wedging the master
select loop - around which the whole program revolves.

Now, wedged output should not do this, but somehow it appears that it is and
need some input to try and understand what is going on.

What I am particularly looking for is a wedge that resets itself after a
period of time (some of the wedges seem to do this).

Dirk G1TLH
--
Dirk Koopman <djk at tobit.co.uk>


_______________________________________________
Dxspider-support mailing list
Dxspider-support at dxcluster.org
http://mailman.tobit.co.uk/mailman/listinfo/dxspider-support




More information about the Dxspider-support mailing list