[Dxspider-support] Node crash again - too many open files

Keith, G6NHU g6nhu at me.com
Sun Apr 20 23:28:05 BST 2025


Sorry for the list spam but that didn’t resolve it.   I was able to check by doing:

grep '^Max open files' /proc/[spider-service PID]/limits

Which showed the limit was still 1024 files.

I’ve now edited /etc/systemd/system.conf, dehashed the DefaultLimitNOFILE=1024:524288 line and increased the values to 8192:4194304, restarted and now I can see the limit has increased to 8192 files.

I’m now pretty sure this should have resolved it.

Posting this as a fix in case anyone gets the same problem.

73 Keith.

On 20 Apr 2025 at 22:59 +0100, Keith, G6NHU via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
> Thanks Dave for the very quick direct email.
>
> I’ve added 'LimitNOFile=infinity:infinity’ to /usr/lib/systemd/system/spider.service, done a systemctl daemon-reload and restarted the service.
>
> We’ll see if that helps.
>
> 73 Keith.
> On 20 Apr 2025 at 22:41 +0100, Keith, G6NHU via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
> > Second day in a row, a crash and restart as per below.
> >
> > Dirk, I know you suggested increasing ulimit in the shell which I thought I’d done yesterday (see my email from yesterday afternoon) but it appears not.
> >
> > I assume you’ve got around this with WA9PIE-2 so I’d really appreciate some help please.  I don’t want the node rebooting regularly like this, especially not at weekends but I don’t know what to do next.   The node runs as a service so how can I launch it from a shell script?  I am still relatively inexperienced with Linux so I just don’t know what to do here.
> >
> > Thanks,
> >
> > 73 Keith.
> >
> >
> > 1745177833^RING: 19:37:13^(trace) writing /spider/local_data/rbn_cache Too many open files
> > 1745177833^RING: 19:37:13^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
> > 1745177833^RING: 19:37:13^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
> > 1745177833^RING: 19:37:13^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 902
> > 1745177833^RING: 19:37:13^(*) DXSpider Ceasing
> > 1745177833^RING: 19:37:13^(*) DXQSL finished
> > 1745177833^RING: 19:37:13^(*) RBN:WRITE_CACHE size: 377.687KB time to write: 34 mS
> > 1745177833^RING: 19:37:13^(*) DXDupe finishing
> > 1745177833^RING: 19:37:13^(*) DXUser finished
> > 1745177833^RING: 19:37:13^(cluster) DXSpider v1.57 build 615 (git: mojo/9f7fb47f[r]) using perl v5.38.2 on linux ended
> > 1745177833^RING: 19:37:13^(*) bye bye everyone - bye bye
> > 1745177833^###
> > 1745177833^### RINGBUFFER END 501 debug lines written
> > On 19 Apr 2025 at 16:27 +0100, Keith, G6NHU via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
	> > This has just happened again to me, this time on the DO droplet that’s running the node.
> >
> > I’ve added ulimit -S -n 65536 to .bashrc in both sysop and root logins as per here: https://askubuntu.com/questions/1492277/on-ubuntu-22-04-editing-limits-conf-to-increase-number-of-file-descriptors-does
> >
> > Hopefully that’ll fix it.
> >
> > 73 Keith.
> >
> > 1745074921^###
> > 1745074921^### RINGBUFFER END 501 debug lines written
> > 1745074921^###
> > 1745074921^(trace) can't open /spider/local_data/wcy/param Too many open files
> > 1745074921^(trace) Stack (2): WCY::DXDebug::confess in /spider/perl/WCY.pm line: 79
> > 1745074921^(trace) Stack (3): WCY::WCY::store in /spider/perl/WCY.pm line: 123
> > 1745074921^(trace) Stack (4): DXProt::WCY::update in /spider/perl/DXProtHandle.pm line: 1775
> > 1745074921^(trace) Stack (5): DXProt::DXProt::handle_73 in /spider/perl/DXProt.pm line: 466
> > 1745074921^(trace) Stack (6): DXChannel::DXProt::normal in /spider/perl/DXChannel.pm line: 746
> > 1745074921^(trace) Stack (7): DXChannel::DXChannel::process_one in /spider/perl/DXChannel.pm line: 239
> > 1745074921^(trace) Stack (8): main::DXChannel::rec in /spider/perl/cluster.pl line: 424
> > 1745074921^(trace) Stack (9): ExtMsg::main::__ANON__ in /spider/perl/ExtMsg.pm line: 120
> > 1745074921^(trace) Stack (10): Msg::ExtMsg::dequeue in /spider/perl/Msg.pm line: 500
> > 1745074921^(trace) Stack (11): ExtMsg::Msg::_rcv in /spider/perl/ExtMsg.pm line: 83
> > 1745074921^(trace) Stack (12): Msg::ExtMsg::_rcv in /spider/perl/Msg.pm line: 511
> > 1745074921^(*) DXSpider Ceasing
> > 1745074921^(*) DXQSL finished
> > 1745074921^(*) RBN:WRITE_CACHE size: 357.289KB time to write: 31 mS
> > 1745074921^(*) DXDupe finishing
> > 1745074921^(*) DXUser finished
> > 1745074921^(cluster) DXSpider v1.57 build 615 (git: mojo/9f7fb47f[r]) using perl v5.38.2 on linux ended
> > 1745074921^(*) bye bye everyone - bye bye
> > On 23 Mar 2025 at 23:08 +0000, djk via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
	> > There is standard limit of 1024 files open at once per process. You can change this in a shell with 'ulimit -n 2048' (for example). There is also way of changing it system wide in systemd (<spit>) but you'll have to research that yourself or start the node in a shell script like:
> > #!/bin/sh
> > ulimit -n 2048
> > /spider/perl/cluster.pl
> > Personally, 900+ users on a 4GB RPi x is going some, especially considering power required to run some windows cluster software (and then still not keeping).
> > What does your 'top' say when you are running it at this sort of usage?
> > Dirk G1TLH
> > On 23/03/2025 18:54, Keith, G6NHU via Dxspider-support wrote:
	> > I suppose this really is for Dirk.
> >
> > This has never happened before - I came into the shack with a freshly poured shackbeer and noticed my ssh session had closed so I logged back in and saw my uptime was just 21 minutes.
> >
> > Checking the debug log (attached as a .zip), this is what happened in the same timestamp with the actual error that caused the crash being at the end.
> >
> > My cluster is running on a Pi5 with 4Gb RAM and an external Samsung SSD.   I don’t know the exact number of connected users but when I logged back on, there were 938 so I’d imagine the number prior to the crash was around the same.   The node had been up for about a month.
> >
> > “Too many open files” ?
> >
> > 73 Keith
> >
> > 1742753087^###
> > 1742753087^### RINGBUFFER END 501 debug lines written
> > 1742753087^###
> > 1742753087^(trace) writing /spider/local_data/rbn_cache Too many open files
> > 1742753087^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
> > 1742753087^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
> > 1742753087^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 892
> > 1742753087^(*) DXSpider Ceasing
> > 1742753087^(*) DXQSL finished
> > 1742753087^(*) RBN:WRITE_CACHE size: 423.804KB time to write: 26 mS
> > 1742753087^(*) DXDupe finishing
> > 1742753087^(*) DXUser finished
> > 1742753087^(cluster) DXSpider v1.57 build 568 (git: mojo/0920a333[r]) using perl v5.36.0 on linux ended
> > 1742753087^(*) bye bye everyone - bye bye
> > 1742753087^###
> > 1742753087^### RINGBUFFER START at line 0 (zero base)
> > 1742753087^###
> >
> > Then it repeats
> >
> > 1742753087^RING: 18:04:47^(trace) writing /spider/local_data/rbn_cache Too many open files
> > 1742753087^RING: 18:04:47^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
> > 1742753087^RING: 18:04:47^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
> > 1742753087^RING: 18:04:47^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 892
> > 1742753087^RING: 18:04:47^(*) DXSpider Ceasing
> > 1742753087^RING: 18:04:47^(*) DXQSL finished
> > 1742753087^RING: 18:04:47^(*) RBN:WRITE_CACHE size: 423.804KB time to write: 26 mS
> > 1742753087^RING: 18:04:47^(*) DXDupe finishing
> > 1742753087^RING: 18:04:47^(*) DXUser finished
> > 1742753087^RING: 18:04:47^(cluster) DXSpider v1.57 build 568 (git: mojo/0920a333[r]) using perl v5.36.0 on linux ended
> > 1742753087^RING: 18:04:47^(*) bye bye everyone - bye bye
> > 1742753087^###
> > 1742753087^### RINGBUFFER END 501 debug lines written
> > 1742753087^###
> >
> > _______________________________________________
> > Dxspider-support mailing list
> > Dxspider-support at tobit.co.uk
> > https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
> > _______________________________________________
> > Dxspider-support mailing list
> > Dxspider-support at tobit.co.uk
> > https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
> > _______________________________________________
> > Dxspider-support mailing list
> > Dxspider-support at tobit.co.uk
> > https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
> > _______________________________________________
> > Dxspider-support mailing list
> > Dxspider-support at tobit.co.uk
> > https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20250420/9765d718/attachment-0001.htm>


More information about the Dxspider-support mailing list