[Dxspider-support] Crash insight?

Peter pc2a at pi4cc.nl
Mon Oct 31 16:17:51 GMT 2005


Hi

I have similar problems, but NOT during the contest. I have seen in de debug log a station
who has a fast (> 20 sec) with a connect and disconnect. But it looks like a local issue
and not a fast coneect from the user side When you look via the stats page  you will find
that data in and out are on the same level eg: http://www.pi4cc.nl/dxcluster/mrtg/msg.html

Peter
PC2A
Dirk Koopman wrote:

> On Sun, 2005-10-30 at 20:46 -0500, Mike McCarthy, W1NR wrote:
> > At roughly 1100Z on Sunday morning, my cluster hung for a while.  My node is
> > running on SuSE Linux 9.2.  Some of the telnet connections and all my node
> > links went down and I could not connect via telnet.  I was able to log in to
> > the node via SSH.  Netstat showed that one IP address was flooding the node
> > with connection attempts to the spider port.  At about 1120z, the cluster
> > came back without rebooting.  I am attempting to contact the user that
> > flooded the telnet port to try and find out what he was using for software
> > to connect to the cluster.  It appears that it went wacko.  All of the
> > connects were in the LAST_ACK state.  Dirk, does this ring any bells?
>
> No, or rather maybe? It occurs to me that there is some client software
> out there (which I think is Belgian) which is deliberately designed to
> connect to several nodes at once (up to 50 IIRC). The justification for
> this, I believe, is that you connect all over the world and it thus
> gives you an "edge" in the pileup if you get the spot delivered direct,
> rather than 7th hand via the network. The fact that, these days, the
> difference is maybe 2 seconds seems not to make any impression on the
> author.
>
> I seem to remember that we had a similar problem a year or so ago (not
> in CQWW) and Angel (IIRC) traced it to a user of this client program. It
> appears that it is (or was?) rather aggressive about reconnecting and
> sometimes it just went wild. In effect DoSing the cluster node because
> it was not detecting the fact that was, in fact, connecting ok and
> sending out a one connection attempt after another.
>
> > I will try and get more information on the source of the problem and post
> > it.
> >
>
> If anybody else has had trouble this last weekend with lockup, whether
> on windows or on linux: please would you share your experiences on this
> list.
>
> I would like to try to get a handle on it, preferably whilst people
> still have the debug files around to look at (remember that they get
> cleaned out on a rolling 10 day basis [as default]).
>
> Dirk G1TLH
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at dxcluster.org
> http://mailman.tobit.co.uk/mailman/listinfo/dxspider-support





More information about the Dxspider-support mailing list