[Dxspider-support] Node crash again - too many open files

Keith, G6NHU g6nhu at me.com
Sun Apr 20 22:59:04 BST 2025


Thanks Dave for the very quick direct email.

I’ve added 'LimitNOFile=infinity:infinity’ to /usr/lib/systemd/system/spider.service, done a systemctl daemon-reload and restarted the service.

We’ll see if that helps.

73 Keith.
On 20 Apr 2025 at 22:41 +0100, Keith, G6NHU via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
> Second day in a row, a crash and restart as per below.
>
> Dirk, I know you suggested increasing ulimit in the shell which I thought I’d done yesterday (see my email from yesterday afternoon) but it appears not.
>
> I assume you’ve got around this with WA9PIE-2 so I’d really appreciate some help please.  I don’t want the node rebooting regularly like this, especially not at weekends but I don’t know what to do next.   The node runs as a service so how can I launch it from a shell script?  I am still relatively inexperienced with Linux so I just don’t know what to do here.
>
> Thanks,
>
> 73 Keith.
>
>
> 1745177833^RING: 19:37:13^(trace) writing /spider/local_data/rbn_cache Too many open files
> 1745177833^RING: 19:37:13^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
> 1745177833^RING: 19:37:13^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
> 1745177833^RING: 19:37:13^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 902
> 1745177833^RING: 19:37:13^(*) DXSpider Ceasing
> 1745177833^RING: 19:37:13^(*) DXQSL finished
> 1745177833^RING: 19:37:13^(*) RBN:WRITE_CACHE size: 377.687KB time to write: 34 mS
> 1745177833^RING: 19:37:13^(*) DXDupe finishing
> 1745177833^RING: 19:37:13^(*) DXUser finished
> 1745177833^RING: 19:37:13^(cluster) DXSpider v1.57 build 615 (git: mojo/9f7fb47f[r]) using perl v5.38.2 on linux ended
> 1745177833^RING: 19:37:13^(*) bye bye everyone - bye bye
> 1745177833^###
> 1745177833^### RINGBUFFER END 501 debug lines written
> On 19 Apr 2025 at 16:27 +0100, Keith, G6NHU via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
	> This has just happened again to me, this time on the DO droplet that’s running the node.
>
> I’ve added ulimit -S -n 65536 to .bashrc in both sysop and root logins as per here: https://askubuntu.com/questions/1492277/on-ubuntu-22-04-editing-limits-conf-to-increase-number-of-file-descriptors-does
>
> Hopefully that’ll fix it.
>
> 73 Keith.
>
> 1745074921^###
> 1745074921^### RINGBUFFER END 501 debug lines written
> 1745074921^###
> 1745074921^(trace) can't open /spider/local_data/wcy/param Too many open files
> 1745074921^(trace) Stack (2): WCY::DXDebug::confess in /spider/perl/WCY.pm line: 79
> 1745074921^(trace) Stack (3): WCY::WCY::store in /spider/perl/WCY.pm line: 123
> 1745074921^(trace) Stack (4): DXProt::WCY::update in /spider/perl/DXProtHandle.pm line: 1775
> 1745074921^(trace) Stack (5): DXProt::DXProt::handle_73 in /spider/perl/DXProt.pm line: 466
> 1745074921^(trace) Stack (6): DXChannel::DXProt::normal in /spider/perl/DXChannel.pm line: 746
> 1745074921^(trace) Stack (7): DXChannel::DXChannel::process_one in /spider/perl/DXChannel.pm line: 239
> 1745074921^(trace) Stack (8): main::DXChannel::rec in /spider/perl/cluster.pl line: 424
> 1745074921^(trace) Stack (9): ExtMsg::main::__ANON__ in /spider/perl/ExtMsg.pm line: 120
> 1745074921^(trace) Stack (10): Msg::ExtMsg::dequeue in /spider/perl/Msg.pm line: 500
> 1745074921^(trace) Stack (11): ExtMsg::Msg::_rcv in /spider/perl/ExtMsg.pm line: 83
> 1745074921^(trace) Stack (12): Msg::ExtMsg::_rcv in /spider/perl/Msg.pm line: 511
> 1745074921^(*) DXSpider Ceasing
> 1745074921^(*) DXQSL finished
> 1745074921^(*) RBN:WRITE_CACHE size: 357.289KB time to write: 31 mS
> 1745074921^(*) DXDupe finishing
> 1745074921^(*) DXUser finished
> 1745074921^(cluster) DXSpider v1.57 build 615 (git: mojo/9f7fb47f[r]) using perl v5.38.2 on linux ended
> 1745074921^(*) bye bye everyone - bye bye
> On 23 Mar 2025 at 23:08 +0000, djk via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
	> There is standard limit of 1024 files open at once per process. You can change this in a shell with 'ulimit -n 2048' (for example). There is also way of changing it system wide in systemd (<spit>) but you'll have to research that yourself or start the node in a shell script like:
> #!/bin/sh
> ulimit -n 2048
> /spider/perl/cluster.pl
> Personally, 900+ users on a 4GB RPi x is going some, especially considering power required to run some windows cluster software (and then still not keeping).
> What does your 'top' say when you are running it at this sort of usage?
> Dirk G1TLH
> On 23/03/2025 18:54, Keith, G6NHU via Dxspider-support wrote:
	> I suppose this really is for Dirk.
>
> This has never happened before - I came into the shack with a freshly poured shackbeer and noticed my ssh session had closed so I logged back in and saw my uptime was just 21 minutes.
>
> Checking the debug log (attached as a .zip), this is what happened in the same timestamp with the actual error that caused the crash being at the end.
>
> My cluster is running on a Pi5 with 4Gb RAM and an external Samsung SSD.   I don’t know the exact number of connected users but when I logged back on, there were 938 so I’d imagine the number prior to the crash was around the same.   The node had been up for about a month.
>
> “Too many open files” ?
>
> 73 Keith
>
> 1742753087^###
> 1742753087^### RINGBUFFER END 501 debug lines written
> 1742753087^###
> 1742753087^(trace) writing /spider/local_data/rbn_cache Too many open files
> 1742753087^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
> 1742753087^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
> 1742753087^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 892
> 1742753087^(*) DXSpider Ceasing
> 1742753087^(*) DXQSL finished
> 1742753087^(*) RBN:WRITE_CACHE size: 423.804KB time to write: 26 mS
> 1742753087^(*) DXDupe finishing
> 1742753087^(*) DXUser finished
> 1742753087^(cluster) DXSpider v1.57 build 568 (git: mojo/0920a333[r]) using perl v5.36.0 on linux ended
> 1742753087^(*) bye bye everyone - bye bye
> 1742753087^###
> 1742753087^### RINGBUFFER START at line 0 (zero base)
> 1742753087^###
>
> Then it repeats
>
> 1742753087^RING: 18:04:47^(trace) writing /spider/local_data/rbn_cache Too many open files
> 1742753087^RING: 18:04:47^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
> 1742753087^RING: 18:04:47^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
> 1742753087^RING: 18:04:47^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 892
> 1742753087^RING: 18:04:47^(*) DXSpider Ceasing
> 1742753087^RING: 18:04:47^(*) DXQSL finished
> 1742753087^RING: 18:04:47^(*) RBN:WRITE_CACHE size: 423.804KB time to write: 26 mS
> 1742753087^RING: 18:04:47^(*) DXDupe finishing
> 1742753087^RING: 18:04:47^(*) DXUser finished
> 1742753087^RING: 18:04:47^(cluster) DXSpider v1.57 build 568 (git: mojo/0920a333[r]) using perl v5.36.0 on linux ended
> 1742753087^RING: 18:04:47^(*) bye bye everyone - bye bye
> 1742753087^###
> 1742753087^### RINGBUFFER END 501 debug lines written
> 1742753087^###
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20250420/d8fe96c2/attachment-0001.htm>


More information about the Dxspider-support mailing list