[Dxspider-support] Weird node issue that prevents user logins

Dirk Koopman G1TLH gb7tlh at dxcluster.org
Mon Oct 5 13:09:29 BST 2009


Brendan Minish wrote:
> Hello 
> 
> I am seeing a strange issue on EI7SDX that arises very occasionally
> causing it to not progress any user logins.
> 
>  from a user's perspective the login hangs at 
> 
> $ telnet ei7sdx.ath.cx 7300
> Trying 95.172.12.170...
> Connected to ei7sdx.ath.cx.
> Escape character is '^]'.
> 
> 
> when I log into the cluster it appears to be running but I see when I run ps aux 
> hundreds of processes that look like this 
> 
> sysop    27358  0.0  0.0      0     0 ?        Z    Oct01   0:00 [cp] <defunct>
> 
> they go back over several days but the issue only arose from a user's
> perspective yesterday 
> 
> the sysop console can't login either, restarting the cluster by killing
> it's pid gets things going again but all the defunct processes remain
> until after a reboot
> 
> the cluster is running on Centos 5.3 and is pretty much identical in
> most ways to ei7mre except that ei7sdx has a web client running. Am i
> seeing fallout from web client users not terminating sessions or
> something ? 
> 
> any ideas anyone ? 
> 

Are you using the C client at all?

In general, this sort of thing is caused by processes spawning other 
processes and then not "wait(2)"ing for them. What would concern me is 
that the program is being waited for is 'cp'.

A 'pstree' command may help you find out which program is spawning these 
jobs.

DXSpider does not spawn processes "out of the box", certainly not 'cp' 
commands. Although one can get the cron system to do it.

I would be getting a tad paranoid at this point and wondering whether 
someone has got in through that well known security hole known as 'apache'.

Dirk




More information about the Dxspider-support mailing list