[Dxspider-support] Crash insight?

Dirk Koopman djk at tobit.co.uk
Mon Oct 31 16:03:21 GMT 2005


On Sun, 2005-10-30 at 20:46 -0500, Mike McCarthy, W1NR wrote:
> At roughly 1100Z on Sunday morning, my cluster hung for a while.  My node is
> running on SuSE Linux 9.2.  Some of the telnet connections and all my node
> links went down and I could not connect via telnet.  I was able to log in to
> the node via SSH.  Netstat showed that one IP address was flooding the node
> with connection attempts to the spider port.  At about 1120z, the cluster
> came back without rebooting.  I am attempting to contact the user that
> flooded the telnet port to try and find out what he was using for software
> to connect to the cluster.  It appears that it went wacko.  All of the
> connects were in the LAST_ACK state.  Dirk, does this ring any bells?

No, or rather maybe? It occurs to me that there is some client software
out there (which I think is Belgian) which is deliberately designed to
connect to several nodes at once (up to 50 IIRC). The justification for
this, I believe, is that you connect all over the world and it thus
gives you an "edge" in the pileup if you get the spot delivered direct,
rather than 7th hand via the network. The fact that, these days, the
difference is maybe 2 seconds seems not to make any impression on the
author.

I seem to remember that we had a similar problem a year or so ago (not
in CQWW) and Angel (IIRC) traced it to a user of this client program. It
appears that it is (or was?) rather aggressive about reconnecting and
sometimes it just went wild. In effect DoSing the cluster node because
it was not detecting the fact that was, in fact, connecting ok and
sending out a one connection attempt after another.

> I will try and get more information on the source of the problem and post
> it.
> 

If anybody else has had trouble this last weekend with lockup, whether
on windows or on linux: please would you share your experiences on this
list. 

I would like to try to get a handle on it, preferably whilst people
still have the debug files around to look at (remember that they get
cleaned out on a rolling 10 day basis [as default]).

Dirk G1TLH




More information about the Dxspider-support mailing list