<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 05/06/15 06:25, Pascal Stevenhaagen
      [PB1SAM] wrote:<br>
    </div>
    <blockquote cite="mid:557132E0.9010404@gmail.com" type="cite">
      <meta content="text/html; charset=windows-1252"
        http-equiv="Content-Type">
      <div class="moz-cite-prefix">Most of the times the cluster stops
        and starts correct.<br>
        Once in a while it goes wrong.<br>
        You can also remove the file by yourself, like<br>
        rm /spider/local/cluster.lck<br>
        <br>
        To make a batch, simply create a text file with a name you like,<br>
        like /spider/spiderstart.sh<br>
        Then add the following lines:<br>
        <br>
        #!/bin/bash<br>
        rm /spider/local/cluster.lck<br>
        /usr/bin/perl -w /spider/perl/cluster.pl<br>
        <br>
        <br>
        Close and save the file, give it execute permissions.<br>
        Now edit your inittab file, and finde the line<br>
        <code>
          <pre>##Start DXSpider on bootup and respawn it should it crash
DX:3:respawn:/bin/su -c "/usr/bin/perl -w /spider/perl/cluster.pl" sysop >/dev/tty7


Change "/usr/bin/perl -w /spider/perl/cluster.pl"<code></code> to
"/spider/spiderstart.sh"
</pre>
        </code><br>
      </div>
    </blockquote>
    <br>
    I am getting rather concerned by this. The cluster.lck file is there
    for a reason. If the node has crashed or otherwise stopped without
    removing the .lck file, then restarting the node will not fail. The
    reason for this is that the .lck file contains the process id of the
    cluster.pl that started it. The cluster.pl reads its .lck file and,
    if the process id in that file doesn't exist, then it will just
    start normally. You don't need to remove .lck file first. The new
    process will replace that old process id in that file with its own.
    All sorts of support programs (e.g. create/update_sysop.pl) rely on
    that lock file being there to prevent accidents/corruptions
    occurring. <br>
    <br>
    So if another cluster.pl is started and it complains about there
    being another process already running, then the chances are strong
    that it is not lying. Removing the .lck file and then just starting
    another cluster.pl will certainly corrupt things like the userfile -
    but it will also fail anyway because it also won't be able to start
    up listeners on ports like 27754 or 7300/8000 (depending on which
    you use). <br>
    <br>
    So we need to get to the bottom of this. Please in mind that, just
    now, there are 387 nodes in the DXSpider compatible network. All bar
    about 40 of those are running DXSpider and have zero problems with
    .lck files (that have been reported at least). <br>
    <br>
    I do not approve of "work arounds" like the one detailed above. If
    they are a) necessary and b) actually work, then I want to know why
    and how so that the main line code can be changed to make the work
    around unnecessary. <br>
    <br>
    If cluster.pl complains that it is already running then it is a
    simple matter to check:<br>
    <br>
    <tt>ps ax | grep cluster</tt><br>
    <br>
    will produce something like:<br>
    <br>
    <tt>10764 pts/53   S+     0:00 grep --color=auto cluster</tt><tt><br>
    </tt><tt>19194 pts/48   S+    24:27 perl ./cluster.pl</tt><br>
    <br>
    That says that there is a node currently running. If you look at the
    cluster.lck:<br>
    <br>
    <tt>cat local/cluster.lck</tt><tt><br>
      <br>
    </tt>gives:<tt><br>
      <br>
    </tt><tt>19194</tt><br>
    <br>
    You can see that it agrees with the "ps ax". This means that it is
    running and if you can't get in then there is something else wrong.
    You can investigate "hanging" in the first instance by look at this:
    <br>
    <br>
    <tt>netstat -tapn | grep -P '7300|27754'</tt><tt><br>
    </tt><tt>(Not all processes could be identified, non-owned process
      info</tt><tt><br>
    </tt><tt> will not be shown, you would have to be root to see it
      all.)</tt><tt><br>
    </tt><tt>tcp        0      0 0.0.0.0:7300           
      0.0.0.0:*               LISTEN      19194/perl      </tt><tt><br>
    </tt><tt>tcp        0      0 127.0.0.1:27754        
      0.0.0.0:*               LISTEN      19194/perl      </tt><tt><br>
    </tt><tt>tcp        0      0 127.0.0.1:27754        
      127.0.0.1:40010         ESTABLISHED 19194/perl      </tt><tt><br>
    </tt><tt>tcp        0      0 127.0.0.1:40010        
      127.0.0.1:27754         ESTABLISHED 19264/perl      </tt><tt><br>
    </tt><br>
    The two numbers to the right of the 'tcp' are number of bytes in the
    receive and tx queues. On a quiet system they will be 0. Even on a
    busy system:<br>
    <br>
    <tt>tcp        0      0 82.103.135.24:7300     
      0.0.0.0:*               LISTEN     </tt><tt><br>
    </tt><tt>tcp        0      0 127.0.0.1:7300         
      0.0.0.0:*               LISTEN     </tt><tt><br>
    </tt><tt>tcp        0      0 127.0.0.1:27754        
      0.0.0.0:*               LISTEN     </tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      213.138.110.143:42281   ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      99.6.147.52:49377       ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      83.162.186.242:36370    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:39326    
      195.171.43.144:7300     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      217.146.110.41:36623    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      93.142.192.186:51161    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      217.160.22.169:49550    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      85.220.185.162:49179    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      178.128.167.25:54418    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      164.126.146.32:49172    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      92.72.61.178:51181      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      83.252.226.34:59837     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      107.211.218.32:2416     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      89.140.118.183:55942    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      80.0.168.159:59331      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      89.168.61.217:1091      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      93.95.80.107:4308       ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      2.230.223.137:49356     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 127.0.0.1:27754        
      127.0.0.1:50397         TIME_WAIT  </tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      79.141.97.6:52724       ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      68.100.98.221:65350     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      80.217.42.87:49317      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      99.82.248.159:1400      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      108.72.240.121:53807    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      92.13.176.38:49173      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      109.69.104.145:39727    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      58.162.248.200:63144    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      86.26.145.167:2664      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      83.4.1.15:5738          ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      92.4.126.139:52101      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      86.167.103.21:56777     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      37.228.211.66:49286     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      79.106.20.5:62846       ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      184.1.71.111:58854      FIN_WAIT2  </tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      108.85.7.188:64788      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      212.159.40.67:61489     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      202.154.141.28:49388    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      87.81.158.136:56937     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      78.70.174.109:64745     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      193.53.39.133:2791      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      91.3.231.33:49730       ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      73.26.162.240:51183     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      78.25.123.189:17219     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      216.54.125.50:3243      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      87.114.78.109:63748     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      78.1.230.118:53454      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      84.106.116.54:3161      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      82.0.27.159:1177        ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      204.235.44.74:58853     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      184.1.71.111:58852      TIME_WAIT  </tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      68.47.234.14:51087      ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      70.178.167.206:51860    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      68.100.96.156:64788     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      71.123.183.231:59821    ESTABLISHED</tt><tt><br>
    </tt><tt>tcp        0      0 82.103.135.24:7300     
      188.221.68.33:49760     ESTABLISHED</tt><tt><br>
    </tt><tt>tcp6       0      0 2a00:9080:1:5cf::1:7300
      :::*                    LISTEN     </tt><tt><br>
    </tt><tt>tcp6       0      0 2a00:9080:1:5cf::1:7300
      2001:41c8:51:457::60624 ESTABLISHED</tt><tt><br>
    </tt><tt>tcp6       0      0 2a00:9080:1:5cf::1:7300
      2a01:260:8033:1:c:27973 ESTABLISHED</tt><tt><br>
    </tt><tt>tcp6       0      0 2a00:9080:1:5cf::1:7300
      2a01:7e00::f03c:9:36535 ESTABLISHED</tt><tt><br>
    </tt><br>
    They will be (mostly) 0. If you have anything else then this needs
    to be investigated.<br>
    <br>
    Dirk G1TLH<br>
    <br>
  </body>
</html>