[Dxspider-support] Node crash again - too many open files
Keith, G6NHU
g6nhu at me.com
Sun Apr 20 22:41:30 BST 2025
Second day in a row, a crash and restart as per below.
Dirk, I know you suggested increasing ulimit in the shell which I thought I’d done yesterday (see my email from yesterday afternoon) but it appears not.
I assume you’ve got around this with WA9PIE-2 so I’d really appreciate some help please. I don’t want the node rebooting regularly like this, especially not at weekends but I don’t know what to do next. The node runs as a service so how can I launch it from a shell script? I am still relatively inexperienced with Linux so I just don’t know what to do here.
Thanks,
73 Keith.
1745177833^RING: 19:37:13^(trace) writing /spider/local_data/rbn_cache Too many open files
1745177833^RING: 19:37:13^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
1745177833^RING: 19:37:13^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
1745177833^RING: 19:37:13^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 902
1745177833^RING: 19:37:13^(*) DXSpider Ceasing
1745177833^RING: 19:37:13^(*) DXQSL finished
1745177833^RING: 19:37:13^(*) RBN:WRITE_CACHE size: 377.687KB time to write: 34 mS
1745177833^RING: 19:37:13^(*) DXDupe finishing
1745177833^RING: 19:37:13^(*) DXUser finished
1745177833^RING: 19:37:13^(cluster) DXSpider v1.57 build 615 (git: mojo/9f7fb47f[r]) using perl v5.38.2 on linux ended
1745177833^RING: 19:37:13^(*) bye bye everyone - bye bye
1745177833^###
1745177833^### RINGBUFFER END 501 debug lines written
On 19 Apr 2025 at 16:27 +0100, Keith, G6NHU via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
This has just happened again to me, this time on the DO droplet that’s running the node.
I’ve added ulimit -S -n 65536 to .bashrc in both sysop and root logins as per here: https://askubuntu.com/questions/1492277/on-ubuntu-22-04-editing-limits-conf-to-increase-number-of-file-descriptors-does
Hopefully that’ll fix it.
73 Keith.
1745074921^###
1745074921^### RINGBUFFER END 501 debug lines written
1745074921^###
1745074921^(trace) can't open /spider/local_data/wcy/param Too many open files
1745074921^(trace) Stack (2): WCY::DXDebug::confess in /spider/perl/WCY.pm line: 79
1745074921^(trace) Stack (3): WCY::WCY::store in /spider/perl/WCY.pm line: 123
1745074921^(trace) Stack (4): DXProt::WCY::update in /spider/perl/DXProtHandle.pm line: 1775
1745074921^(trace) Stack (5): DXProt::DXProt::handle_73 in /spider/perl/DXProt.pm line: 466
1745074921^(trace) Stack (6): DXChannel::DXProt::normal in /spider/perl/DXChannel.pm line: 746
1745074921^(trace) Stack (7): DXChannel::DXChannel::process_one in /spider/perl/DXChannel.pm line: 239
1745074921^(trace) Stack (8): main::DXChannel::rec in /spider/perl/cluster.pl line: 424
1745074921^(trace) Stack (9): ExtMsg::main::__ANON__ in /spider/perl/ExtMsg.pm line: 120
1745074921^(trace) Stack (10): Msg::ExtMsg::dequeue in /spider/perl/Msg.pm line: 500
1745074921^(trace) Stack (11): ExtMsg::Msg::_rcv in /spider/perl/ExtMsg.pm line: 83
1745074921^(trace) Stack (12): Msg::ExtMsg::_rcv in /spider/perl/Msg.pm line: 511
1745074921^(*) DXSpider Ceasing
1745074921^(*) DXQSL finished
1745074921^(*) RBN:WRITE_CACHE size: 357.289KB time to write: 31 mS
1745074921^(*) DXDupe finishing
1745074921^(*) DXUser finished
1745074921^(cluster) DXSpider v1.57 build 615 (git: mojo/9f7fb47f[r]) using perl v5.38.2 on linux ended
1745074921^(*) bye bye everyone - bye bye
On 23 Mar 2025 at 23:08 +0000, djk via Dxspider-support <dxspider-support at tobit.co.uk>, wrote:
There is standard limit of 1024 files open at once per process. You can change this in a shell with 'ulimit -n 2048' (for example). There is also way of changing it system wide in systemd (<spit>) but you'll have to research that yourself or start the node in a shell script like:
#!/bin/sh
ulimit -n 2048
/spider/perl/cluster.pl
Personally, 900+ users on a 4GB RPi x is going some, especially considering power required to run some windows cluster software (and then still not keeping).
What does your 'top' say when you are running it at this sort of usage?
Dirk G1TLH
On 23/03/2025 18:54, Keith, G6NHU via Dxspider-support wrote:
I suppose this really is for Dirk.
This has never happened before - I came into the shack with a freshly poured shackbeer and noticed my ssh session had closed so I logged back in and saw my uptime was just 21 minutes.
Checking the debug log (attached as a .zip), this is what happened in the same timestamp with the actual error that caused the crash being at the end.
My cluster is running on a Pi5 with 4Gb RAM and an external Samsung SSD. I don’t know the exact number of connected users but when I logged back on, there were 938 so I’d imagine the number prior to the crash was around the same. The node had been up for about a month.
“Too many open files” ?
73 Keith
1742753087^###
1742753087^### RINGBUFFER END 501 debug lines written
1742753087^###
1742753087^(trace) writing /spider/local_data/rbn_cache Too many open files
1742753087^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
1742753087^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
1742753087^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 892
1742753087^(*) DXSpider Ceasing
1742753087^(*) DXQSL finished
1742753087^(*) RBN:WRITE_CACHE size: 423.804KB time to write: 26 mS
1742753087^(*) DXDupe finishing
1742753087^(*) DXUser finished
1742753087^(cluster) DXSpider v1.57 build 568 (git: mojo/0920a333[r]) using perl v5.36.0 on linux ended
1742753087^(*) bye bye everyone - bye bye
1742753087^###
1742753087^### RINGBUFFER START at line 0 (zero base)
1742753087^###
Then it repeats
1742753087^RING: 18:04:47^(trace) writing /spider/local_data/rbn_cache Too many open files
1742753087^RING: 18:04:47^(trace) Stack (2): RBN::DXDebug::confess in /spider/perl/RBN.pm line: 926
1742753087^RING: 18:04:47^(trace) Stack (3): RBN::RBN::write_cache in /spider/perl/RBN.pm line: 871
1742753087^RING: 18:04:47^(trace) Stack (4): main::RBN::per_minute in /spider/perl/cluster.pl line: 892
1742753087^RING: 18:04:47^(*) DXSpider Ceasing
1742753087^RING: 18:04:47^(*) DXQSL finished
1742753087^RING: 18:04:47^(*) RBN:WRITE_CACHE size: 423.804KB time to write: 26 mS
1742753087^RING: 18:04:47^(*) DXDupe finishing
1742753087^RING: 18:04:47^(*) DXUser finished
1742753087^RING: 18:04:47^(cluster) DXSpider v1.57 build 568 (git: mojo/0920a333[r]) using perl v5.36.0 on linux ended
1742753087^RING: 18:04:47^(*) bye bye everyone - bye bye
1742753087^###
1742753087^### RINGBUFFER END 501 debug lines written
1742753087^###
_______________________________________________
Dxspider-support mailing list
Dxspider-support at tobit.co.uk
https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
_______________________________________________
Dxspider-support mailing list
Dxspider-support at tobit.co.uk
https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
_______________________________________________
Dxspider-support mailing list
Dxspider-support at tobit.co.uk
https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20250420/afc0ffb2/attachment.htm>
More information about the Dxspider-support
mailing list