[Dxspider-support] Node crashed at 18:04 (and now =?utf-8?Q?it=E2=80=99s_?=happened again)

Keith, G6NHU g6nhu at me.com
Sat Apr 19 16:26:54 BST 2025


This has just happened again to me, this time on the DO droplet that’s running the node.

I’ve added ulimit -S -n 65536 to .bashrc in both sysop and root logins as per here: https://askubuntu.com/questions/1492277/on-ubuntu-22-04-editing-limits-conf-to-increase-number-of-file-descriptors-does

Hopefully that’ll fix it.

73 Keith.

1745074921^###
1745074921^### RINGBUFFER END 501 debug lines written
1745074921^###
1745074921^(trace) can't open /spider/local_data/wcy/param Too many open files
1745074921^(trace) Stack (2): WCY::DXDebug::confess in /spider/perl/WCY.pm line: 79
1745074921^(trace) Stack (3): WCY::WCY::store in /spider/perl/WCY.pm line: 123
1745074921^(trace) Stack (4): DXProt::WCY::update in /spider/perl/DXProtHandle.pm line: 1775
1745074921^(trace) Stack (5): DXProt::DXProt::handle_73 in /spider/perl/DXProt.pm line: 466
1745074921^(trace) Stack (6): DXChannel::DXProt::normal in /spider/perl/DXChannel.pm line: 746
1745074921^(trace) Stack (7): DXChannel::DXChannel::process_one in /spider/perl/DXChannel.pm line: 239
1745074921^(trace) Stack (8): main::DXChannel::rec in /spider/perl/cluster.pl line: 424
1745074921^(trace) Stack (9): ExtMsg::main::__ANON__ in /spider/perl/ExtMsg.pm line: 120
1745074921^(trace) Stack (10): Msg::ExtMsg::dequeue in /spider/perl/Msg.pm line: 500
1745074921^(trace) Stack (11): ExtMsg::Msg::_rcv in /spider/perl/ExtMsg.pm line: 83
1745074921^(trace) Stack (12): Msg::ExtMsg::_rcv in /spider/perl/Msg.pm line: 511
1745074921^(*) DXSpider Ceasing
1745074921^(*) DXQSL finished
1745074921^(*) RBN:WRITE_CACHE size: 357.289KB time to write: 31 mS
1745074921^(*) DXDupe finishing
1745074921^(*) DXUser finished
1745074921^(cluster) DXSpider v1.57 build 615 (git: mojo/9f7fb47f[r]) using perl v5.38.2 on linux ended
1745074921^(*) bye bye everyone - bye bye
On 23 Mar 2025 at 23:08 +0000, djk via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
> There is standard limit of 1024 files open at once per process. You can change this in a shell with 'ulimit -n 2048' (for example). There is also way of changing it system wide in systemd (<spit>) but you'll have to research that yourself or start the node in a shell script like:
> #!/bin/sh
> ulimit -n 2048
> /spider/perl/cluster.pl
> Personally, 900+ users on a 4GB RPi x is going some, especially considering power required to run some windows cluster software (and then still not keeping).
> What does your 'top' say when you are running it at this sort of usage?
> Dirk G1TLH
> On 23/03/2025 18:54, Keith, G6NHU via Dxspider-support wrote:
> > I suppose this really is for Dirk.
> >
> > This has never happened before - I came into the shack with a freshly poured shackbeer and noticed my ssh session had closed so I logged back in and saw my uptime was just 21 minutes.
> >
> > Checking the debug log (attached as a .zip), this is what happened in the same timestamp with the actual error that caused the crash being at the end.
> >
> > My cluster is running on a Pi5 with 4Gb RAM and an external Samsung SSD.   I don’t know the exact number of connected users but when I logged back on, there were 938 so I’d imagine the number prior to the crash was around the same.   The node had been up for about a month.
> >
> > “Too many open files” ?
> >
> > 73 Keith
> >
> > 1742753087^###
> > 1742753087^### RINGBUFFER END 501 debug lines written
> > 1742753087^###
> > 1742753087^(trace) writing /spider/local_data/rbn_cache Too many open files
> > 1742753087^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
> > 1742753087^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
> > 1742753087^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 892
> > 1742753087^(*) DXSpider Ceasing
> > 1742753087^(*) DXQSL finished
> > 1742753087^(*) RBN:WRITE_CACHE size: 423.804KB time to write: 26 mS
> > 1742753087^(*) DXDupe finishing
> > 1742753087^(*) DXUser finished
> > 1742753087^(cluster) DXSpider v1.57 build 568 (git: mojo/0920a333[r]) using perl v5.36.0 on linux ended
> > 1742753087^(*) bye bye everyone - bye bye
> > 1742753087^###
> > 1742753087^### RINGBUFFER START at line 0 (zero base)
> > 1742753087^###
> >
> > Then it repeats
> >
> > 1742753087^RING: 18:04:47^(trace) writing /spider/local_data/rbn_cache Too many open files
> > 1742753087^RING: 18:04:47^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
> > 1742753087^RING: 18:04:47^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
> > 1742753087^RING: 18:04:47^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 892
> > 1742753087^RING: 18:04:47^(*) DXSpider Ceasing
> > 1742753087^RING: 18:04:47^(*) DXQSL finished
> > 1742753087^RING: 18:04:47^(*) RBN:WRITE_CACHE size: 423.804KB time to write: 26 mS
> > 1742753087^RING: 18:04:47^(*) DXDupe finishing
> > 1742753087^RING: 18:04:47^(*) DXUser finished
> > 1742753087^RING: 18:04:47^(cluster) DXSpider v1.57 build 568 (git: mojo/0920a333[r]) using perl v5.36.0 on linux ended
> > 1742753087^RING: 18:04:47^(*) bye bye everyone - bye bye
> > 1742753087^###
> > 1742753087^### RINGBUFFER END 501 debug lines written
> > 1742753087^###
> >
> > _______________________________________________
> > Dxspider-support mailing list
> > Dxspider-support at tobit.co.uk
> > https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20250419/fe8a593a/attachment.htm>


More information about the Dxspider-support mailing list