[Dxspider-support] Cluster failure
Dirk Koopman
djk at tobit.co.uk
Sun May 17 12:43:00 CEST 2020
I have back ported a mod to the Mojo::IOLoop::Subprocess to use the
serialisation method that I was using with the previous (now deprecated)
method (..::ForkCall). I strong suspect that it should fix this problem.
In the meantime I am successfully carrying on
Do an update and see whether it fixes your problem. This is a stepping
to stone to a more stable solution to user file errors as well as errors
of this sort, by eliminating Storage as a serialisation mechanism.
Please recreate your users file before restarting with this new version.
Remind me, what type of cpu are you running this on? I don't see these
errors, but I am running Intel hardware 64bit ubuntu 18.04.
Dirk
On 17/05/2020 02:17, Joaquin . via Dxspider-support wrote:
> Hi,
> Yesterday I had 10 cluster restarts.
> I saw that when running a sh/dx/30 sometimes the cluster was rebooted.
> I have reviewed the debug file and observed that in most of them there
> was a sh/dx nnn or sh/dx/nnn before the error message and subsequent
> restart.
> On 3 occasions the machine reached 100% cpu and had to be restarted
> for not responding.
> I have rebuilt user_asc, but never find error.
>
> HW 1 cpu, 2 GiB memory
> load average: 0.03, 0.01, 0.00
> Memory 7 %
> OS Ubuntu Ubuntu 18.04.4 LTS
>
> I do not know what I can do.
>
> Kin EA3CV
>
> ...
> 1589661271^(chan) <- I EA1FU sh/dx 150
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #announce#
> 1589661271^(chan) -> D EA1FU #announce#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #wcy#
> 1589661271^(chan) -> D EA1FU #wcy#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #wwv#
> 1589661271^(chan) -> D EA1FU #wwv#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #ready#
> 1589661271^(chan) -> D EA1FU #ready#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661281^Magic number checking on storable string failed at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661281^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661281^ eval {...} called at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
> 1589661281^ Storable::thaw("") called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661281^ eval {...} called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661281^
> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55a0f3a9d578))
> called at /usr/local/share/perl/5.26.1/Mojo/EventEm
> itter.pm <http://itter.pm> line 15
> 1589661281^
> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55a0f3a9d578),
> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/S
> tream.pm <http://tream.pm> line 27
> ...
>
> 1589661549^(chan) <- I EA1FU sh/dx 150
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #announce#
> 1589661549^(chan) -> D EA1FU #announce#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #wcy#
> 1589661549^(chan) -> D EA1FU #wcy#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #wwv#
> 1589661549^(chan) -> D EA1FU #wwv#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #ready#
> 1589661549^(chan) -> D EA1FU #ready#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661562^Magic number checking on storable string failed at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661562^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661562^ eval {...} called at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
> 1589661562^ Storable::thaw("") called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661562^ eval {...} called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661562^
> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x555d8eb49fa0))
> called at /usr/local/share/perl/5.26.1/Mojo/EventEm
> itter.pm <http://itter.pm> line 15
> 1589661562^
> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x555d8eb49fa0),
> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/S
> tream.pm <http://tream.pm> line 27
> ...
>
> 1589661762^(chan) <- I EA1FU sh/dx 150
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #announce#
> 1589661762^(chan) -> D EA1FU #announce#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #wcy#
> 1589661762^(chan) -> D EA1FU #wcy#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #wwv#
> 1589661762^(chan) -> D EA1FU #wwv#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #ready#
> 1589661762^(chan) -> D EA1FU #ready#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661785^(msg) queue msg (0)
> 1589661785^Magic number checking on storable string failed at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661785^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661785^ eval {...} called at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
> 1589661785^ Storable::thaw("") called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661785^ eval {...} called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661785^
> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55f98142d4f0))
> called at /usr/local/share/perl/5.26.1/Mojo/EventEm
> itter.pm <http://itter.pm> line 15
> 1589661785^
> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55f98142d4f0),
> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/S
> tream.pm <http://tream.pm> line 27
> ...
>
> 1589673726^(chan) <- I EA7A SH/DX/120
> 1589673726^(chan) -> D EA7A EA3CV-2>
> 1589673728^(chan) <- I AE5E
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H94^
> 1589673728^(chan) -> D SV5FRI-1
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D WB3FFV-2
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D N6WS-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D VE9SC
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D N0OJ-7
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D F1OYP-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D IW9EDP-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D F4EYQ-1
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) <- I WB3FFV-2
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H92^
> 1589673728^(chan) <- I SV5FRI-1
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H94^
> 1589673728^(chan) <- I VE9SC
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H95^
> 1589673728^(chan) <- I IW9EDP-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) <- I F1OYP-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H90^
> 1589673741^(msg) queue msg (0)
> 1589673995^Git: 1.57-226-g3b932bf2
> 1589673995^Git: V=1.57 S= B=226 g=3b932bf2
> 1589673995^DXSpider V1.57, build 226 (git: 3b932bf2[r]) started
> 1589673995^Copyright (c) 1998-2020 Dirk Koopman G1TLH
> 1589673995^loading prefixes ...
> ...
>
>
> El jue., 14 may. 2020 a las 23:08, Dirk Koopman via Dxspider-support
> (<dxspider-support at tobit.co.uk <mailto:dxspider-support at tobit.co.uk>>)
> escribió:
>
> Stop the node.
>
> Rebuild the user file "cd /spider/local_data; perl user_asc"
>
> restart the node
>
> I am getting really sick of Storable and DB_File. I am doing some
> benchmarking at the moment for a viable replacement. I have a
> Storable replacement, The DB_File replace could be an issue.
>
> Part of the problem is I suspect that something that runs in a
> spawned command (like sh/dx, sh/log etc) is doing a $user->put (a
> write) which will likely screw up DB_File. I have had a good look
> for this, but it only seems to have started when I did the
> abortive attempt to store ip addresses. Ir *may* be that there is
> lingering corruption there from that attempt.
>
> But the first thing anyone should do if this sort of thing happens
> is always: rebuild the user file (as above) and delete the dupefile.
>
> Dirk
>
> On 14/05/2020 21:18, Joaquin . via Dxspider-support wrote:
>> Hi,
>>
>> I don't know what is happening but in the last two days I have
>> had dxspider restarts since I went to build 226.
>>
>> This is the last thing before restarting the cluster:
>>
>> 1589403382^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
>> 1589403382^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm
>> line 426.
>> 1589403382^ eval {...} called at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
>> 1589403382^ Storable::thaw("") called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589403382^ eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589403382^
>> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/EventEmitter.pm line 15
>> 1589403382^
>> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8),
>> "close") called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 27
>> 1589403382^
>> Mojo::IOLoop::Stream::close(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 110
>> 1589403382^
>> Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 58
>> 1589403382^
>> Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>> called at /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589403382^ eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589403382^
>> Mojo::Reactor::Poll::_try(Mojo::Reactor::EV=HASH(0x55bfd2c61c18),
>> "I/O watcher", undef, 0) called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 5
>> 2
>> 1589403382^
>> Mojo::Reactor::EV::__ANON__(EV::IO=SCALAR(0x55bfd46984c0), 3)
>> called at /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589403382^ eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589403382^
>> Mojo::Reactor::EV::start(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop.pm line 134
>> 1589403382^ Mojo::IOLoop::start("Mojo::IOLoop") called at
>> /spider/perl/cluster.pl <http://cluster.pl> line 781
>> 1589403382^ main::start_node() called at
>> /spider/perl/cluster.pl <http://cluster.pl> line 798
>> 1589403382^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426,
>> 1589403382^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426, at
>> /spider/perl/Log/Agent.pm line 21.
>> 1589403382^ Log::Agent::logcroak("Magic number checking on
>> storable string failed at /usr/lib/x"...) called at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 427
>> 1589403382^ Storable::thaw("") called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589403382^ eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589403382^
>> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/EventEmitter.pm line 15
>> 1589403382^
>> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8),
>> "close") called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 27
>> 1589403382^
>> Mojo::IOLoop::Stream::close(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 110
>> 1589403382^
>> Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 58
>> 1589403382^
>> Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>> called at /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589403382^ eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589403382^
>> Mojo::Reactor::Poll::_try(Mojo::Reactor::EV=HASH(0x55bfd2c61c18),
>> "I/O watcher", "", 0) called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 52
>> 1589403382^
>> Mojo::Reactor::EV::__ANON__(EV::IO=SCALAR(0x55bfd46984c0), 3)
>> called at /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589403382^ eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589403382^
>> Mojo::Reactor::EV::start(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop.pm line 134
>> 1589403382^ Mojo::IOLoop::start("Mojo::IOLoop") called at
>> /spider/perl/cluster.pl <http://cluster.pl> line 781
>> 1589403382^ main::start_node() called at
>> /spider/perl/cluster.pl <http://cluster.pl> line 798
>> 1589403382^(chan) DXChannel EC5A destroyed (16)
>> 1589403382^(chan) DXChannel EA4SE destroyed (15)
>> 1589403382^(chan) DXChannel EA3CV-1 destroyed (14)
>> 1589403382^(chan) DXChannel EA1FU destroyed (13)
>> 1589403382^ (in cleanup) (in cleanup) at
>> /spider/perl/Msg.pm line 566 during global destruction.
>>
>>
>>
>> 1589484851^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426,
>> <GEN1666> line 110,
>> 1589484851^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426,
>> <GEN1666> line 110, at /spider/perl/Log/Agent.pm li
>> ne 21, <GEN1666> line 110.
>> 1589484851^ Log::Agent::logcroak("Magic number checking on
>> storable string failed at /usr/lib/x"...) called at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 427
>> 1589484851^ Storable::thaw("") called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589484851^ eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589484851^
>> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/EventEmitter.pm line 15
>> 1589484851^
>> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8),
>> "close") called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 27
>> 1589484851^
>> Mojo::IOLoop::Stream::close(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 110
>> 1589484851^
>> Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 58
>> 1589484851^
>> Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::EV=HASH(0x55d2e6c8d1d8))
>> called at /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589484851^ eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589484851^
>> Mojo::Reactor::Poll::_try(Mojo::Reactor::EV=HASH(0x55d2e6c8d1d8),
>> "I/O watcher", 1, 0) called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 52
>> 1589484851^
>> Mojo::Reactor::EV::__ANON__(EV::IO=SCALAR(0x55d2e930bcf0), 3)
>> called at /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589484851^ eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589484851^
>> Mojo::Reactor::EV::start(Mojo::Reactor::EV=HASH(0x55d2e6c8d1d8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop.pm line 134
>> 1589484851^ Mojo::IOLoop::start("Mojo::IOLoop") called at
>> /spider/perl/cluster.pl <http://cluster.pl> line 781
>> 1589484851^ main::start_node() called at
>> /spider/perl/cluster.pl <http://cluster.pl> line 798
>>
>> The version is:DXSpider V1.57, build 226 (git: 3b932bf2[r])
>> Any ideas?
>>
>> 73 de Kin
>>
>>
>> _______________________________________________
>> Dxspider-support mailing list
>> Dxspider-support at tobit.co.uk <mailto:Dxspider-support at tobit.co.uk>
>> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk <mailto:Dxspider-support at tobit.co.uk>
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20200517/0bbea1b6/attachment-0001.htm>
More information about the Dxspider-support
mailing list