[Dxspider-support] Crash insight?

Mike McCarthy, W1NR lists at w1nr.net
Mon Oct 31 22:20:52 GMT 2005


Which log files should I be looking at and saving?

Mike, W1NR
 

-----Original Message-----
From: dxspider-support-bounces at dxcluster.org
[mailto:dxspider-support-bounces at dxcluster.org] On Behalf Of Dirk Koopman
Sent: Monday, October 31, 2005 11:03 AM
To: The DXSpider Support list
Subject: RE: [Dxspider-support] Crash insight?

On Sun, 2005-10-30 at 20:46 -0500, Mike McCarthy, W1NR wrote:
> At roughly 1100Z on Sunday morning, my cluster hung for a while.  My 
> node is running on SuSE Linux 9.2.  Some of the telnet connections and 
> all my node links went down and I could not connect via telnet.  I was 
> able to log in to the node via SSH.  Netstat showed that one IP 
> address was flooding the node with connection attempts to the spider 
> port.  At about 1120z, the cluster came back without rebooting.  I am 
> attempting to contact the user that flooded the telnet port to try and 
> find out what he was using for software to connect to the cluster.  It 
> appears that it went wacko.  All of the connects were in the LAST_ACK
state.  Dirk, does this ring any bells?

No, or rather maybe? It occurs to me that there is some client software out
there (which I think is Belgian) which is deliberately designed to connect
to several nodes at once (up to 50 IIRC). The justification for this, I
believe, is that you connect all over the world and it thus gives you an
"edge" in the pileup if you get the spot delivered direct, rather than 7th
hand via the network. The fact that, these days, the difference is maybe 2
seconds seems not to make any impression on the author.

I seem to remember that we had a similar problem a year or so ago (not in
CQWW) and Angel (IIRC) traced it to a user of this client program. It
appears that it is (or was?) rather aggressive about reconnecting and
sometimes it just went wild. In effect DoSing the cluster node because it
was not detecting the fact that was, in fact, connecting ok and sending out
a one connection attempt after another.

> I will try and get more information on the source of the problem and 
> post it.
> 

If anybody else has had trouble this last weekend with lockup, whether on
windows or on linux: please would you share your experiences on this list. 

I would like to try to get a handle on it, preferably whilst people still
have the debug files around to look at (remember that they get cleaned out
on a rolling 10 day basis [as default]).

Dirk G1TLH


_______________________________________________
Dxspider-support mailing list
Dxspider-support at dxcluster.org
http://mailman.tobit.co.uk/mailman/listinfo/dxspider-support




More information about the Dxspider-support mailing list