[Dxspider-support] Cluster failure

Joaquin . joaquin at cronux.net
Sun May 17 14:43:16 CEST 2020


If it doesn't fail or reboot, I will do it tonight.
At the moment I have updated a copy of the same version and configuration
in a local VM, with real traffic and users with different filters.

The machine of the problems is a Microsoft Azuce VM, and according to them
the CPU is: Intel® Haswell 2.4 GHz E5-2673 v3 processors or better.
Specifically "Standard B1ms (1 vcpus, 2 GiB memory)"

I attach a photo of how the system load was yesterday. I got to not respond
twice and had to do a hard reset.

I will report with the results.

Thanks Dirk.

Kin

El dom., 17 may. 2020 a las 12:50, Dirk Koopman via Dxspider-support (<
dxspider-support at tobit.co.uk>) escribió:

> I have back ported a mod to the Mojo::IOLoop::Subprocess to use the
> serialisation method that I was using with the previous (now deprecated)
> method (..::ForkCall). I strong suspect that it should fix this problem. In
> the meantime I am successfully carrying on
>
>
> Do an update and see whether it fixes your problem. This is a stepping to
> stone to a more stable solution to user file errors as well as errors of
> this sort, by eliminating Storage as a serialisation mechanism.
>
> Please recreate your users file before restarting with this new version.
>
> Remind me, what type of cpu are you running this on? I don't see these
> errors, but I am running Intel hardware 64bit ubuntu 18.04.
>
> Dirk
>
> On 17/05/2020 02:17, Joaquin . via Dxspider-support wrote:
>
> Hi,
> Yesterday I had 10 cluster restarts.
> I saw that when running a sh/dx/30 sometimes the cluster was rebooted.
> I have reviewed the debug file and observed that in most of them there was
> a sh/dx nnn or sh/dx/nnn before the error message and subsequent restart.
> On 3 occasions the machine reached 100% cpu and had to be restarted for
> not responding.
> I have rebuilt user_asc, but never find error.
>
> HW 1 cpu, 2 GiB memory
> load average: 0.03, 0.01, 0.00
> Memory 7 %
> OS Ubuntu Ubuntu 18.04.4 LTS
>
> I do not know what I can do.
>
> Kin EA3CV
>
> ...
> 1589661271^(chan) <- I EA1FU sh/dx 150
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #announce#
> 1589661271^(chan) -> D EA1FU #announce#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #wcy#
> 1589661271^(chan) -> D EA1FU #wcy#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #wwv#
> 1589661271^(chan) -> D EA1FU #wwv#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661271^(chan) <- I EA1FU echo #ready#
> 1589661271^(chan) -> D EA1FU #ready#
> 1589661271^(chan) -> D EA1FU EA3CV-2>
> 1589661281^Magic number checking on storable string failed at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661281^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661281^     eval {...} called at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
> 1589661281^     Storable::thaw("") called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661281^     eval {...} called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661281^
> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55a0f3a9d578))
> called at /usr/local/share/perl/5.26.1/Mojo/EventEm
> itter.pm line 15
> 1589661281^
> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55a0f3a9d578),
> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/S
> tream.pm line 27
> ...
>
> 1589661549^(chan) <- I EA1FU sh/dx 150
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #announce#
> 1589661549^(chan) -> D EA1FU #announce#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #wcy#
> 1589661549^(chan) -> D EA1FU #wcy#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #wwv#
> 1589661549^(chan) -> D EA1FU #wwv#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661549^(chan) <- I EA1FU echo #ready#
> 1589661549^(chan) -> D EA1FU #ready#
> 1589661549^(chan) -> D EA1FU EA3CV-2>
> 1589661562^Magic number checking on storable string failed at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661562^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661562^     eval {...} called at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
> 1589661562^     Storable::thaw("") called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661562^     eval {...} called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661562^
> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x555d8eb49fa0))
> called at /usr/local/share/perl/5.26.1/Mojo/EventEm
> itter.pm line 15
> 1589661562^
> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x555d8eb49fa0),
> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/S
> tream.pm line 27
> ...
>
> 1589661762^(chan) <- I EA1FU sh/dx 150
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #announce#
> 1589661762^(chan) -> D EA1FU #announce#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #wcy#
> 1589661762^(chan) -> D EA1FU #wcy#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #wwv#
> 1589661762^(chan) -> D EA1FU #wwv#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661762^(chan) <- I EA1FU echo #ready#
> 1589661762^(chan) -> D EA1FU #ready#
> 1589661762^(chan) -> D EA1FU EA3CV-2>
> 1589661785^(msg) queue msg (0)
> 1589661785^Magic number checking on storable string failed at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661785^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
> 1589661785^     eval {...} called at
> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
> 1589661785^     Storable::thaw("") called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661785^     eval {...} called at
> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
> 1589661785^
> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55f98142d4f0))
> called at /usr/local/share/perl/5.26.1/Mojo/EventEm
> itter.pm line 15
> 1589661785^
> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55f98142d4f0),
> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/S
> tream.pm line 27
> ...
>
> 1589673726^(chan) <- I EA7A SH/DX/120
> 1589673726^(chan) -> D EA7A EA3CV-2>
> 1589673728^(chan) <- I AE5E
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H94^
> 1589673728^(chan) -> D SV5FRI-1
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D WB3FFV-2
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D N6WS-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D VE9SC
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D N0OJ-7
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D F1OYP-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D IW9EDP-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) -> D F4EYQ-1
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) <- I WB3FFV-2
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H92^
> 1589673728^(chan) <- I SV5FRI-1
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H94^
> 1589673728^(chan) <- I VE9SC
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H95^
> 1589673728^(chan) <- I IW9EDP-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H93^
> 1589673728^(chan) <- I F1OYP-6
> PC92^F5LEN-7^128^D^^1RA4NHY^1HB9GFJ^1SQ7OTA^H90^
> 1589673741^(msg) queue msg (0)
> 1589673995^Git: 1.57-226-g3b932bf2
> 1589673995^Git: V=1.57 S= B=226 g=3b932bf2
> 1589673995^DXSpider V1.57, build 226 (git: 3b932bf2[r]) started
> 1589673995^Copyright (c) 1998-2020 Dirk Koopman G1TLH
> 1589673995^loading prefixes ...
> ...
>
>
> El jue., 14 may. 2020 a las 23:08, Dirk Koopman via Dxspider-support (<
> dxspider-support at tobit.co.uk>) escribió:
>
>> Stop the node.
>>
>> Rebuild the user file "cd /spider/local_data; perl user_asc"
>>
>> restart the node
>>
>> I am getting really sick of Storable and DB_File. I am doing some
>> benchmarking at the moment for a viable replacement. I have a Storable
>> replacement, The DB_File replace could be an issue.
>>
>> Part of the problem is I suspect that something that runs in a spawned
>> command (like sh/dx, sh/log etc) is doing a $user->put (a write) which will
>> likely screw up DB_File. I have had a good look for this, but it only seems
>> to have started when I did the abortive attempt to store ip addresses. Ir
>> *may* be that there is lingering corruption there from that attempt.
>>
>> But the first thing anyone should do if this sort of thing happens is
>> always: rebuild the user file (as above) and delete the dupefile.
>>
>> Dirk
>>
>> On 14/05/2020 21:18, Joaquin . via Dxspider-support wrote:
>>
>> Hi,
>>
>> I don't know what is happening but in the last two days I have had
>> dxspider restarts since I went to build 226.
>>
>> This is the last thing before restarting the cluster:
>>
>> 1589403382^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
>> 1589403382^ at /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426.
>> 1589403382^     eval {...} called at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426
>> 1589403382^     Storable::thaw("") called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589403382^     eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589403382^
>> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/EventEmitter.pm line 15
>> 1589403382^
>> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8),
>> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line
>> 27
>> 1589403382^
>> Mojo::IOLoop::Stream::close(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 110
>> 1589403382^
>> Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 58
>> 1589403382^
>> Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>> called at /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589403382^     eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589403382^
>> Mojo::Reactor::Poll::_try(Mojo::Reactor::EV=HASH(0x55bfd2c61c18), "I/O
>> watcher", undef, 0) called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 5
>> 2
>> 1589403382^
>> Mojo::Reactor::EV::__ANON__(EV::IO=SCALAR(0x55bfd46984c0), 3) called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589403382^     eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589403382^
>> Mojo::Reactor::EV::start(Mojo::Reactor::EV=HASH(0x55bfd2c61c18)) called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop.pm line 134
>> 1589403382^     Mojo::IOLoop::start("Mojo::IOLoop") called at
>> /spider/perl/cluster.pl line 781
>> 1589403382^     main::start_node() called at /spider/perl/cluster.pl
>> line 798
>> 1589403382^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426,
>> 1589403382^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426, at
>> /spider/perl/Log/Agent.pm line 21.
>> 1589403382^     Log::Agent::logcroak("Magic number checking on storable
>> string failed at /usr/lib/x"...) called at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 427
>> 1589403382^     Storable::thaw("") called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589403382^     eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589403382^
>> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/EventEmitter.pm line 15
>> 1589403382^
>> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8),
>> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line
>> 27
>> 1589403382^
>> Mojo::IOLoop::Stream::close(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 110
>> 1589403382^
>> Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0x55bfd54011b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 58
>> 1589403382^
>> Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::EV=HASH(0x55bfd2c61c18))
>> called at /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589403382^     eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589403382^
>> Mojo::Reactor::Poll::_try(Mojo::Reactor::EV=HASH(0x55bfd2c61c18), "I/O
>> watcher", "", 0) called at /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm
>> line 52
>> 1589403382^
>> Mojo::Reactor::EV::__ANON__(EV::IO=SCALAR(0x55bfd46984c0), 3) called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589403382^     eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589403382^
>> Mojo::Reactor::EV::start(Mojo::Reactor::EV=HASH(0x55bfd2c61c18)) called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop.pm line 134
>> 1589403382^     Mojo::IOLoop::start("Mojo::IOLoop") called at
>> /spider/perl/cluster.pl line 781
>> 1589403382^     main::start_node() called at /spider/perl/cluster.pl
>> line 798
>> 1589403382^(chan) DXChannel EC5A destroyed (16)
>> 1589403382^(chan) DXChannel EA4SE destroyed (15)
>> 1589403382^(chan) DXChannel EA3CV-1 destroyed (14)
>> 1589403382^(chan) DXChannel EA1FU destroyed (13)
>> 1589403382^     (in cleanup)    (in cleanup)  at /spider/perl/Msg.pm line
>> 566 during global destruction.
>>
>>
>>
>> 1589484851^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426, <GEN1666> line
>> 110,
>> 1589484851^Magic number checking on storable string failed at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 426, <GEN1666> line
>> 110, at /spider/perl/Log/Agent.pm li
>> ne 21, <GEN1666> line 110.
>> 1589484851^     Log::Agent::logcroak("Magic number checking on storable
>> string failed at /usr/lib/x"...) called at
>> /usr/lib/x86_64-linux-gnu/perl/5.26/Storable.pm line 427
>> 1589484851^     Storable::thaw("") called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589484851^     eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop/Subprocess.pm line 68
>> 1589484851^
>> Mojo::IOLoop::Subprocess::__ANON__(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/EventEmitter.pm line 15
>> 1589484851^
>> Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8),
>> "close") called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line
>> 27
>> 1589484851^
>> Mojo::IOLoop::Stream::close(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 110
>> 1589484851^
>> Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0x55d2e94c95b8))
>> called at /usr/local/share/perl/5.26.1/Mojo/IOLoop/Stream.pm line 58
>> 1589484851^
>> Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::EV=HASH(0x55d2e6c8d1d8))
>> called at /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589484851^     eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/Poll.pm line 145
>> 1589484851^
>> Mojo::Reactor::Poll::_try(Mojo::Reactor::EV=HASH(0x55d2e6c8d1d8), "I/O
>> watcher", 1, 0) called at /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm
>> line 52
>> 1589484851^
>> Mojo::Reactor::EV::__ANON__(EV::IO=SCALAR(0x55d2e930bcf0), 3) called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589484851^     eval {...} called at
>> /usr/local/share/perl/5.26.1/Mojo/Reactor/EV.pm line 30
>> 1589484851^
>> Mojo::Reactor::EV::start(Mojo::Reactor::EV=HASH(0x55d2e6c8d1d8)) called at
>> /usr/local/share/perl/5.26.1/Mojo/IOLoop.pm line 134
>> 1589484851^     Mojo::IOLoop::start("Mojo::IOLoop") called at
>> /spider/perl/cluster.pl line 781
>> 1589484851^     main::start_node() called at /spider/perl/cluster.pl
>> line 798
>>
>> The version is:DXSpider V1.57, build 226 (git: 3b932bf2[r])
>> Any ideas?
>>
>> 73 de Kin
>>
>>
>> _______________________________________________
>> Dxspider-support mailing listDxspider-support at tobit.co.ukhttps://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>>
>>
>> _______________________________________________
>> Dxspider-support mailing list
>> Dxspider-support at tobit.co.uk
>> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>>
>
> _______________________________________________
> Dxspider-support mailing listDxspider-support at tobit.co.ukhttps://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20200517/fd707eea/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Bloqueo.JPG
Type: image/jpeg
Size: 64212 bytes
Desc: not available
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20200517/fd707eea/attachment-0001.jpe>


More information about the Dxspider-support mailing list