<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 05/06/15 06:25, Pascal Stevenhaagen
[PB1SAM] wrote:<br>
</div>
<blockquote cite="mid:557132E0.9010404@gmail.com" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<div class="moz-cite-prefix">Most of the times the cluster stops
and starts correct.<br>
Once in a while it goes wrong.<br>
You can also remove the file by yourself, like<br>
rm /spider/local/cluster.lck<br>
<br>
To make a batch, simply create a text file with a name you like,<br>
like /spider/spiderstart.sh<br>
Then add the following lines:<br>
<br>
#!/bin/bash<br>
rm /spider/local/cluster.lck<br>
/usr/bin/perl -w /spider/perl/cluster.pl<br>
<br>
<br>
Close and save the file, give it execute permissions.<br>
Now edit your inittab file, and finde the line<br>
<code>
<pre>##Start DXSpider on bootup and respawn it should it crash
DX:3:respawn:/bin/su -c "/usr/bin/perl -w /spider/perl/cluster.pl" sysop >/dev/tty7
Change "/usr/bin/perl -w /spider/perl/cluster.pl"<code></code> to
"/spider/spiderstart.sh"
</pre>
</code><br>
</div>
</blockquote>
<br>
I am getting rather concerned by this. The cluster.lck file is there
for a reason. If the node has crashed or otherwise stopped without
removing the .lck file, then restarting the node will not fail. The
reason for this is that the .lck file contains the process id of the
cluster.pl that started it. The cluster.pl reads its .lck file and,
if the process id in that file doesn't exist, then it will just
start normally. You don't need to remove .lck file first. The new
process will replace that old process id in that file with its own.
All sorts of support programs (e.g. create/update_sysop.pl) rely on
that lock file being there to prevent accidents/corruptions
occurring. <br>
<br>
So if another cluster.pl is started and it complains about there
being another process already running, then the chances are strong
that it is not lying. Removing the .lck file and then just starting
another cluster.pl will certainly corrupt things like the userfile -
but it will also fail anyway because it also won't be able to start
up listeners on ports like 27754 or 7300/8000 (depending on which
you use). <br>
<br>
So we need to get to the bottom of this. Please in mind that, just
now, there are 387 nodes in the DXSpider compatible network. All bar
about 40 of those are running DXSpider and have zero problems with
.lck files (that have been reported at least). <br>
<br>
I do not approve of "work arounds" like the one detailed above. If
they are a) necessary and b) actually work, then I want to know why
and how so that the main line code can be changed to make the work
around unnecessary. <br>
<br>
If cluster.pl complains that it is already running then it is a
simple matter to check:<br>
<br>
<tt>ps ax | grep cluster</tt><br>
<br>
will produce something like:<br>
<br>
<tt>10764 pts/53 S+ 0:00 grep --color=auto cluster</tt><tt><br>
</tt><tt>19194 pts/48 S+ 24:27 perl ./cluster.pl</tt><br>
<br>
That says that there is a node currently running. If you look at the
cluster.lck:<br>
<br>
<tt>cat local/cluster.lck</tt><tt><br>
<br>
</tt>gives:<tt><br>
<br>
</tt><tt>19194</tt><br>
<br>
You can see that it agrees with the "ps ax". This means that it is
running and if you can't get in then there is something else wrong.
You can investigate "hanging" in the first instance by look at this:
<br>
<br>
<tt>netstat -tapn | grep -P '7300|27754'</tt><tt><br>
</tt><tt>(Not all processes could be identified, non-owned process
info</tt><tt><br>
</tt><tt> will not be shown, you would have to be root to see it
all.)</tt><tt><br>
</tt><tt>tcp 0 0 0.0.0.0:7300
0.0.0.0:* LISTEN 19194/perl </tt><tt><br>
</tt><tt>tcp 0 0 127.0.0.1:27754
0.0.0.0:* LISTEN 19194/perl </tt><tt><br>
</tt><tt>tcp 0 0 127.0.0.1:27754
127.0.0.1:40010 ESTABLISHED 19194/perl </tt><tt><br>
</tt><tt>tcp 0 0 127.0.0.1:40010
127.0.0.1:27754 ESTABLISHED 19264/perl </tt><tt><br>
</tt><br>
The two numbers to the right of the 'tcp' are number of bytes in the
receive and tx queues. On a quiet system they will be 0. Even on a
busy system:<br>
<br>
<tt>tcp 0 0 82.103.135.24:7300
0.0.0.0:* LISTEN </tt><tt><br>
</tt><tt>tcp 0 0 127.0.0.1:7300
0.0.0.0:* LISTEN </tt><tt><br>
</tt><tt>tcp 0 0 127.0.0.1:27754
0.0.0.0:* LISTEN </tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
213.138.110.143:42281 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
99.6.147.52:49377 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
83.162.186.242:36370 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:39326
195.171.43.144:7300 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
217.146.110.41:36623 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
93.142.192.186:51161 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
217.160.22.169:49550 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
85.220.185.162:49179 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
178.128.167.25:54418 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
164.126.146.32:49172 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
92.72.61.178:51181 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
83.252.226.34:59837 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
107.211.218.32:2416 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
89.140.118.183:55942 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
80.0.168.159:59331 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
89.168.61.217:1091 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
93.95.80.107:4308 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
2.230.223.137:49356 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 127.0.0.1:27754
127.0.0.1:50397 TIME_WAIT </tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
79.141.97.6:52724 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
68.100.98.221:65350 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
80.217.42.87:49317 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
99.82.248.159:1400 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
108.72.240.121:53807 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
92.13.176.38:49173 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
109.69.104.145:39727 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
58.162.248.200:63144 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
86.26.145.167:2664 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
83.4.1.15:5738 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
92.4.126.139:52101 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
86.167.103.21:56777 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
37.228.211.66:49286 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
79.106.20.5:62846 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
184.1.71.111:58854 FIN_WAIT2 </tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
108.85.7.188:64788 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
212.159.40.67:61489 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
202.154.141.28:49388 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
87.81.158.136:56937 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
78.70.174.109:64745 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
193.53.39.133:2791 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
91.3.231.33:49730 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
73.26.162.240:51183 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
78.25.123.189:17219 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
216.54.125.50:3243 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
87.114.78.109:63748 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
78.1.230.118:53454 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
84.106.116.54:3161 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
82.0.27.159:1177 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
204.235.44.74:58853 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
184.1.71.111:58852 TIME_WAIT </tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
68.47.234.14:51087 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
70.178.167.206:51860 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
68.100.96.156:64788 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
71.123.183.231:59821 ESTABLISHED</tt><tt><br>
</tt><tt>tcp 0 0 82.103.135.24:7300
188.221.68.33:49760 ESTABLISHED</tt><tt><br>
</tt><tt>tcp6 0 0 2a00:9080:1:5cf::1:7300
:::* LISTEN </tt><tt><br>
</tt><tt>tcp6 0 0 2a00:9080:1:5cf::1:7300
2001:41c8:51:457::60624 ESTABLISHED</tt><tt><br>
</tt><tt>tcp6 0 0 2a00:9080:1:5cf::1:7300
2a01:260:8033:1:c:27973 ESTABLISHED</tt><tt><br>
</tt><tt>tcp6 0 0 2a00:9080:1:5cf::1:7300
2a01:7e00::f03c:9:36535 ESTABLISHED</tt><tt><br>
</tt><br>
They will be (mostly) 0. If you have anything else then this needs
to be investigated.<br>
<br>
Dirk G1TLH<br>
<br>
</body>
</html>