[Dxspider-support] Cluster failure

Dirk Koopman djk at tobit.co.uk
Sun May 17 12:43:00 CEST 2020


I have back ported a mod to the Mojo::IOLoop::Subprocess to use the 
serialisation method that I was using with the previous (now deprecated) 
method (..::ForkCall). I strong suspect that it should fix this problem. 
In the meantime I am successfully carrying on


Do an update and see whether it fixes your problem. This is a stepping 
to stone to a more stable solution to user file errors as well as errors 
of this sort, by eliminating Storage as a serialisation mechanism.

Please recreate your users file before restarting with this new version.

Remind me, what type of cpu are you running this on? I don't see these 
errors, but I am running Intel hardware 64bit ubuntu 18.04.

Dirk

On 17/05/2020 02:17, Joaquin . via Dxspider-support wrote:
> Hi,
> Yesterday I had 10 cluster restarts.
> I saw that when running a sh/dx/30 sometimes the cluster was rebooted.
> I have reviewed the debug file and observed that in most of them there 
> was a sh/dx nnn or sh/dx/nnn before the error message and subsequent 
> restart.
> On 3 occasions the machine reached 100% cpu and had to be restarted 
> for not responding.
> I have rebuilt user_asc, but never find error.
>
> HW 1 cpu, 2 GiB memory
> load average: 0.03, 0.01, 0.00
> Memory 7 %
> OS Ubuntu Ubuntu 18.04.4 LTS
>
> I do not know what I can do.
>
> Kin EA3CV
>
> ...
> 1589661271^(chan) <- I EA1FU sh/dx 150
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #announce#
> 1589661271^(chan) -> D EA1FU #announce#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #wcy#
> 1589661271^(chan) -> D EA1FU #wcy#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #wwv#
> 1589661271^(chan) -> D EA1FU #wwv#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #ready#
> 1589661271^(chan) -> D EA1FU #ready#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661281^Magic number checking on storable string failed at 
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661281^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661281^     eval {...} called at 
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
> 1589661281^     Storable::thaw("") called at 
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661281^     eval {...} called at 
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661281^ 
> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55a0f3a9d578)) 
> called at /usr/local/share/perl/5.26.1/Mojo/EventEm
> itter.pm <http://itter.pm> line 15
> 1589661281^ 
> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55a0f3a9d578), 
> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/S
> tream.pm <http://tream.pm> line 27
> ...
>
> 1589661549^(chan) <- I EA1FU sh/dx 150
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #announce#
> 1589661549^(chan) -> D EA1FU #announce#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #wcy#
> 1589661549^(chan) -> D EA1FU #wcy#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #wwv#
> 1589661549^(chan) -> D EA1FU #wwv#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #ready#
> 1589661549^(chan) -> D EA1FU #ready#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661562^Magic number checking on storable string failed at 
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661562^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661562^     eval {...} called at 
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
> 1589661562^     Storable::thaw("") called at 
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661562^     eval {...} called at 
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661562^ 
> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x555d8eb49fa0)) 
> called at /usr/local/share/perl/5.26.1/Mojo/EventEm
> itter.pm <http://itter.pm> line 15
> 1589661562^ 
> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x555d8eb49fa0), 
> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/S
> tream.pm <http://tream.pm> line 27
> ...
>
> 1589661762^(chan) <- I EA1FU sh/dx 150
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #announce#
> 1589661762^(chan) -> D EA1FU #announce#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #wcy#
> 1589661762^(chan) -> D EA1FU #wcy#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #wwv#
> 1589661762^(chan) -> D EA1FU #wwv#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #ready#
> 1589661762^(chan) -> D EA1FU #ready#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661785^(msg) queue msg (0)
> 1589661785^Magic number checking on storable string failed at 
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661785^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661785^     eval {...} called at 
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
> 1589661785^     Storable::thaw("") called at 
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661785^     eval {...} called at 
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661785^ 
> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55f98142d4f0)) 
> called at /usr/local/share/perl/5.26.1/Mojo/EventEm
> itter.pm <http://itter.pm> line 15
> 1589661785^ 
> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55f98142d4f0), 
> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/S
> tream.pm <http://tream.pm> line 27
> ...
>
> 1589673726^(chan) <- I EA7A SH/DX/120
> 1589673726^(chan) -> D EA7A EA3CV-2>
> 1589673728^(chan) <- I AE5E 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H94^
> 1589673728^(chan) -> D SV5FRI-1 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D WB3FFV-2 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D N6WS-6 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D VE9SC 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D N0OJ-7 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D F1OYP-6 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D IW9EDP-6 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D F4EYQ-1 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) <- I WB3FFV-2 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H92^
> 1589673728^(chan) <- I SV5FRI-1 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H94^
> 1589673728^(chan) <- I VE9SC 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H95^
> 1589673728^(chan) <- I IW9EDP-6 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) <- I F1OYP-6 
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H90^
> 1589673741^(msg) queue msg (0)
> 1589673995^Git: 1.57-226-g3b932bf2
> 1589673995^Git: V=1.57 S= B=226 g=3b932bf2
> 1589673995^DXSpider V1.57, build 226 (git: 3b932bf2[r]) started
> 1589673995^Copyright (c) 1998-2020 Dirk Koopman G1TLH
> 1589673995^loading prefixes ...
> ...
>
>
> El jue., 14 may. 2020 a las 23:08, Dirk Koopman via Dxspider-support 
> (<dxspider-support at tobit.co.uk <mailto:dxspider-support at tobit.co.uk>>) 
> escribió:
>
>     Stop the node.
>
>     Rebuild the user file "cd /spider/local_data; perl user_asc"
>
>     restart the node
>
>     I am getting really sick of Storable and DB_File. I am doing some
>     benchmarking at the moment for a viable replacement. I have a
>     Storable replacement, The DB_File replace could be an issue.
>
>     Part of the problem is I suspect that something that runs in a
>     spawned  command (like sh/dx, sh/log etc) is doing a $user->put (a
>     write) which will likely screw up DB_File. I have had a good look
>     for this, but it only seems to have started when I did the
>     abortive attempt to store ip addresses. Ir *may* be that there is
>     lingering corruption there from that attempt.
>
>     But the first thing anyone should do if this sort of thing happens
>     is always: rebuild the user file (as above) and delete the dupefile.
>
>     Dirk
>
>     On 14/05/2020 21:18, Joaquin . via Dxspider-support wrote:
>>     Hi,
>>
>>     I don't know what is happening but in the last two days I have
>>     had dxspider restarts since I went to build 226.
>>
>>     This is the last thing before restarting the cluster:
>>
>>     1589403382^Magic number checking on storable string failed at
>>     /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
>>     1589403382^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm
>>     line 426.
>>     1589403382^     eval {...} called at
>>     /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
>>     1589403382^     Storable::thaw("") called at
>>     /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>>     1589403382^     eval {...} called at
>>     /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>>     1589403382^
>>     Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/EventEmitter.pm line 15
>>     1589403382^
>>     Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8),
>>     "close") called at
>>     /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 27
>>     1589403382^
>>     Mojo::IOLoop::Stream::close(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 110
>>     1589403382^
>>     Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 58
>>     1589403382^
>>     Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>>     called at /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>>     1589403382^     eval {...} called at
>>     /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>>     1589403382^
>>     Mojo::Reactor::Poll::_try(Mojo::Reactor::EV=HASH(0x55bfd2c61c18),
>>     "I/O watcher", undef, 0) called at
>>     /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 5
>>     2
>>     1589403382^
>>     Mojo::Reactor::EV::__ANON__(EV::IO=SCALAR(0x55bfd46984c0), 3)
>>     called at /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>>     1589403382^     eval {...} called at
>>     /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>>     1589403382^
>>     Mojo::Reactor::EV::start(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>>     called at /usr/local/share/perl/5.26.1/Mojo/IOLoop.pm line 134
>>     1589403382^     Mojo::IOLoop::start("Mojo::IOLoop") called at
>>     /spider/perl/cluster.pl <http://cluster.pl> line 781
>>     1589403382^     main::start_node() called at
>>     /spider/perl/cluster.pl <http://cluster.pl> line 798
>>     1589403382^Magic number checking on storable string failed at
>>     /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426,
>>     1589403382^Magic number checking on storable string failed at
>>     /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426, at
>>     /spider/perl/Log/Agent.pm line 21.
>>     1589403382^     Log::Agent::logcroak("Magic number checking on
>>     storable string failed at /usr/lib/x"...) called at
>>     /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 427
>>     1589403382^     Storable::thaw("") called at
>>     /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>>     1589403382^     eval {...} called at
>>     /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>>     1589403382^
>>     Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/EventEmitter.pm line 15
>>     1589403382^
>>     Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8),
>>     "close") called at
>>     /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 27
>>     1589403382^
>>     Mojo::IOLoop::Stream::close(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 110
>>     1589403382^
>>     Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 58
>>     1589403382^
>>     Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>>     called at /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>>     1589403382^     eval {...} called at
>>     /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>>     1589403382^
>>     Mojo::Reactor::Poll::_try(Mojo::Reactor::EV=HASH(0x55bfd2c61c18),
>>     "I/O watcher", "", 0) called at
>>     /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 52
>>     1589403382^
>>     Mojo::Reactor::EV::__ANON__(EV::IO=SCALAR(0x55bfd46984c0), 3)
>>     called at /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>>     1589403382^     eval {...} called at
>>     /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>>     1589403382^
>>     Mojo::Reactor::EV::start(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>>     called at /usr/local/share/perl/5.26.1/Mojo/IOLoop.pm line 134
>>     1589403382^     Mojo::IOLoop::start("Mojo::IOLoop") called at
>>     /spider/perl/cluster.pl <http://cluster.pl> line 781
>>     1589403382^     main::start_node() called at
>>     /spider/perl/cluster.pl <http://cluster.pl> line 798
>>     1589403382^(chan) DXChannel EC5A destroyed (16)
>>     1589403382^(chan) DXChannel EA4SE destroyed (15)
>>     1589403382^(chan) DXChannel EA3CV-1 destroyed (14)
>>     1589403382^(chan) DXChannel EA1FU destroyed (13)
>>     1589403382^     (in cleanup)    (in cleanup)  at
>>     /spider/perl/Msg.pm line 566 during global destruction.
>>
>>
>>
>>     1589484851^Magic number checking on storable string failed at
>>     /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426,
>>     <GEN1666> line 110,
>>     1589484851^Magic number checking on storable string failed at
>>     /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426,
>>     <GEN1666> line 110, at /spider/perl/Log/Agent.pm li
>>     ne 21, <GEN1666> line 110.
>>     1589484851^     Log::Agent::logcroak("Magic number checking on
>>     storable string failed at /usr/lib/x"...) called at
>>     /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 427
>>     1589484851^     Storable::thaw("") called at
>>     /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>>     1589484851^     eval {...} called at
>>     /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>>     1589484851^
>>     Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/EventEmitter.pm line 15
>>     1589484851^
>>     Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8),
>>     "close") called at
>>     /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 27
>>     1589484851^
>>     Mojo::IOLoop::Stream::close(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 110
>>     1589484851^
>>     Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 58
>>     1589484851^
>>     Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::EV=HASH(0x55d2e6c8d1d8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>>     1589484851^     eval {...} called at
>>     /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>>     1589484851^
>>     Mojo::Reactor::Poll::_try(Mojo::Reactor::EV=HASH(0x55d2e6c8d1d8),
>>     "I/O watcher", 1, 0) called at
>>     /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 52
>>     1589484851^
>>     Mojo::Reactor::EV::__ANON__(EV::IO=SCALAR(0x55d2e930bcf0), 3)
>>     called at /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>>     1589484851^     eval {...} called at
>>     /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>>     1589484851^
>>     Mojo::Reactor::EV::start(Mojo::Reactor::EV=HASH(0x55d2e6c8d1d8))
>>     called at /usr/local/share/perl/5.26.1/Mojo/IOLoop.pm line 134
>>     1589484851^     Mojo::IOLoop::start("Mojo::IOLoop") called at
>>     /spider/perl/cluster.pl <http://cluster.pl> line 781
>>     1589484851^     main::start_node() called at
>>     /spider/perl/cluster.pl <http://cluster.pl> line 798
>>
>>     The version is:DXSpider V1.57, build 226 (git: 3b932bf2[r])
>>     Any ideas?
>>
>>     73 de Kin
>>
>>
>>     _______________________________________________
>>     Dxspider-support mailing list
>>     Dxspider-support at tobit.co.uk  <mailto:Dxspider-support at tobit.co.uk>
>>     https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
>     _______________________________________________
>     Dxspider-support mailing list
>     Dxspider-support at tobit.co.uk <mailto:Dxspider-support at tobit.co.uk>
>     https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20200517/0bbea1b6/attachment-0001.htm>


More information about the Dxspider-support mailing list