[Dxspider-support] Cluster hangs

Jimmy Turner k5jtj at arrl.net
Sun Mar 6 00:25:32 GMT 2005


I had the same thing happen to our cluster, but I would get a message
like this  "cannot fork, try again" every time I tried to issue a
command.  I checked the ps ax command and found lots of stuff called
"hotplug" running, about 60-100 instances.  I also found some alerts in
my /var/log/secure.log files pertaining to invalid SSH logins.  I
figured that this was an omen telling me to shut down SSH until I really
need it.  Since I have disabled SSH on the box (it's local anyway) I
have not had any other lock ups.  
I hope this helps,
73  de  K5JTJ
sysop K5PLD-2 Spider DXCluster.

On Sat, 2005-03-05 at 17:29, Kelly Jones wrote:
> Hi guys,
> 
> I seem to have an issue somewhere.  For whatever reason, DX Spider just 
> seems to 'hang' for no apparent reason.  I noticed this during the ARRL CW 
> contest a couple of weeks ago and it appears to be doing it again during 
> the phone contest.
> 
> If I try to enter any command or even a simple carriage return, I get no 
> response on the screen.  If I try to telnet in, I am connected, but never 
> greeted with the 'welcome' message.
> 
> If I let the cluster continue to run, eventually (anywhere from 10 to 30 
> minutes later) it will 'break loose' and spew the last x minutes of data to 
> the screen all at once.  Once this happens the cluster appears to hang once 
> again and we repeat the symptoms as above.
> 
> If it tail the .dat file in the /spider/data/debug/2005/xxx.dat, there is 
> no activity being written to this file when the cluster is 
> 'hung'.  However, once it 'lets go', all of the activity is written at once.
> 
> The only way to 'reset' the cluster is to restart it.  Once a restart 
> happens, it's good for a while at which time it eventually hangs again.
> 
> If I run a 'top', there appears to be almost 0 cluster activity and the box 
> is running at 99% idle.  Currently running v1.51 build 58.323.
> 
> I never had this problem with the old cluster which was running code base 
> from about two years ago, but for whatever reason this new box isn't 
> performing very well.
> 
> Any ideas what could be causing my 'hang'?
> 
> 73
> Kelly - N0VD
> 
> 
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at dxcluster.org
> http://www.tobit.co.uk/mailman/listinfo/dxspider-support
> 
> 





More information about the Dxspider-support mailing list