[Dxspider-support] 1. Cluster started hanging several times a day. (David Spoelstra)

David Spoelstra davids at mediamachine.com
Thu Mar 7 16:32:05 CET 2019


Keith, Andy, and Bernard: My RasPi3B is running off a hard drive. I
switched it to a hard drive when I replaced the RasPi2B with the RasPi3
because of all the issues you mention.

Bernard - it's not the power supply because the RasPi is still running fine
and I can do other things with it (abet slowly since the hung spider
process is taking so much CPU power). Also, I have this RasPi and another
hooked to a GOOD 5V 10A supply and the other RasPi isn't having any
problems. The spider process is just essentially dead since it's sitting at
about 100% CPU. I just kill the process and immediately restart it and
everything is fine for about another day.

I just need some more data to debug it. Are there flags to turn on to get
more logs? Are there places to look I'm not looking (I've already dug
through syslog)?

On Thu, Mar 7, 2019 at 9:34 AM f6bvp via Dxspider-support <
dxspider-support at tobit.co.uk> wrote:

> Hi,
> In my experience using larger SD card, say 32 Gb, also considerably
> improves performances as seen when compiling Direwolf. I only use high
> quality SD cards (10 category for video) and have rarely got a failure.
> Apart from that, daily failure could be related to DC power supply failure
> or insufficient driving current (2.5 A for RPi 3B).
>
> 73 de Bernard f6bvp
>
> Sent from my iPhone
>
> Le 7 mars 2019 à 15:04, Andy Cook G4PIQ via Dxspider-support <
> dxspider-support at tobit.co.uk> a écrit :
>
> I also found running on an external USB connected hard disk delivered much
> better I/O performance than running on the SD card.
>
> Andy, G4PIQ
>
> Sent from my iPhone
>
> On 7 Mar 2019, at 13:44, gu6efb--- via Dxspider-support <
> dxspider-support at tobit.co.uk> wrote:
>
> Hi
>
>
>
> The most likely reason is that the memory card holding the OS has worn out
> as it is only good for a limited amount of read/write cycles. DXspider puts
> a high read/write load on the card
>
> I had the same issue here with odd unexplainable  cluster failures and it
> was all down to the card.
>
> I now use a hard disk connected to the PI which has solved my issues .
>
>
>
> 73 Keith GU6EFB
>
>
>
> *From:* Dxspider-support [mailto:dxspider-support-bounces at tobit.co.uk
> <dxspider-support-bounces at tobit.co.uk>] *On Behalf Of *Andy Scott via
> Dxspider-support
> *Sent:* 07 March 2019 11:36
> *To:* dxspider-support at tobit.co.uk
> *Cc:* Andy Scott
> *Subject:* [Dxspider-support] 1. Cluster started hanging several times a
> day. (David Spoelstra)
>
>
>
> I cannot offer any conclusive feedback, but I have found, on raspberry pi,
> that a
>
> symptom of failure on my Pi is that the cluster service fails to
> communicate with clients on the user port. I have set a cron task to see if
> a login: prompt is received when telneting into the user port (note that
> you cannot rely on a telnet timeout, as communication with the port still
> works fine).  If it times out waiting for the login prompt, I kill the
> dxcluster process and restart the service - the cron script runs one minute
> before the normal dxspider cron to reconnect to the upstream server.
>
>
>
> In doing so, I haven't needed to reboot the server, although this may
> still be needed from time to time if a different symptom presents.
>
>
>
> The cron script is as follows, in case it helps anyone (the  /var/tmp
> folder should exist and be writeable):
>
> #!/bin/bash
> #
> # First, capture the process ID for the dxspider server
> #
> PID=`pgrep cluster.pl` ;
> #
> # Test that we get a login prompt - not just that the telnet port is open
> # We're going to send  a login of _quit_  which will cause the cluster
> # to reject the login and close the session after writing a trace to
> # a temporary file that we can test later, checking its file length
> #
> echo "quit" | nc -w 2 [your server address] 7300 &> /var/tmp/dxtest
> grep -i "login" /var/tmp/dxtest
> test=$(cat < /var/tmp/dxtest)  || exit  #      exit if the file doesn't
> exist
> #
> #  If the tested file length is long enough to include "login" then all is
> working
> #
> if [ "${#test}" -lt 5 ]
>         then
>         # echo "The DXCluster has falledn over..."
>         kill -9 $PID ;
>         sleep 3
>         # echo 'reset by checkcluster.sh' | system-cat -t DXCluster -p
> warning ;
>         /etc/rc.d/rc.spider
>         sleep 1
>         logger "DXCluster reset -p warning"
> fi
> exit
>
> # The End
>
>
>
>
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20190307/59cad0d3/attachment-0001.html>


More information about the Dxspider-support mailing list