[Dxspider-support] Node crash again - too many open files

Dirk Koopman djk at tobit.co.uk
Mon Apr 21 12:52:06 BST 2025


Try ulimit -Hn On my machine (64GB RAM, 64bit Debian 12 bookworm) that 
gives: 1048576 (file handles).

If that is the case (or some similar large number) I am guessing that 
you are using a distro that uses systemd, in which case the official way 
of dealing with this is to:

sudo vi /etc/security/limits.d/numproc.conf

and in that file, write something like:

#<domain>    <type> <item>    <value>
#
   sysop       soft     nofile     2048

save that file, then restart the whole machine.

Then open a shell and check that the ulimit has changed.

Dirk G1TLH

On 20/04/2025 22:41, Keith, G6NHU wrote:
> Second day in a row, a crash and restart as per below.
>
> Dirk, I know you suggested increasing ulimit in the shell which I 
> thought I’d done yesterday (see my email from yesterday afternoon) but 
> it appears not.
>
> I assume you’ve got around this with WA9PIE-2 so I’d really appreciate 
> some help please.  I don’t want the node rebooting regularly like 
> this, especially not at weekends but I don’t know what to do next.  
>  The node runs as a service so how can I launch it from a shell 
> script?  I am still relatively inexperienced with Linux so I just 
> don’t know what to do here.
>
> Thanks,
>
> 73 Keith.
>
>
> *1745177833*^RING: 19:37:13^(trace) writing 
> /spider/local_data/rbn_cache Too many open files
> *1745177833*^RING: 19:37:13^(trace) Stack (2): RBN::DXDebug::confess 
> in /spider/perl/RBN.pm line: 926
> *1745177833*^RING: 19:37:13^(trace) Stack (3): RBN::RBN::write_cache 
> in /spider/perl/RBN.pm line: 871
> *1745177833*^RING: 19:37:13^(trace) Stack (4): main::RBN::per_minute 
> in /spider/perl/cluster.pl line: 902
> *1745177833*^RING: 19:37:13^(*) DXSpider Ceasing
> *1745177833*^RING: 19:37:13^(*) DXQSL finished
> *1745177833*^RING: 19:37:13^(*) RBN:WRITE_CACHE size: 377.687KB time 
> to write: 34 mS
> *1745177833*^RING: 19:37:13^(*) DXDupe finishing
> *1745177833*^RING: 19:37:13^(*) DXUser finished
> *1745177833*^RING: 19:37:13^(cluster) DXSpider v1.57 build 615 (git: 
> mojo/9f7fb47f[r]) using perl v5.38.2 on linux ended
> *1745177833*^RING: 19:37:13^(*) bye bye everyone - bye bye
> *1745177833*^###
> *1745177833*^### RINGBUFFER END 501 debug lines written
> On 19 Apr 2025 at 16:27 +0100, Keith, G6NHU via Dxspider-support 
> <dxspider-support at tobit.co.uk>, wrote:
>> This has just happened again to me, this time on the DO droplet 
>> that’s running the node.
>>
>> I’ve added ulimit -S -n 65536 to .bashrc in both sysop and root 
>> logins as per here: 
>> https://askubuntu.com/questions/1492277/on-ubuntu-22-04-editing-limits-conf-to-increase-number-of-file-descriptors-does
>>
>> Hopefully that’ll fix it.
>>
>> 73 Keith.
>>
>> 1745074921^###
>> 1745074921^### RINGBUFFER END 501 debug lines written
>> 1745074921^###
>> 1745074921^(trace) can't open /spider/local_data/wcy/param Too many 
>> open files
>> 1745074921^(trace) Stack (2): WCY::DXDebug::confess in 
>> /spider/perl/WCY.pm line: 79
>> 1745074921^(trace) Stack (3): WCY::WCY::store in /spider/perl/WCY.pm 
>> line: 123
>> 1745074921^(trace) Stack (4): DXProt::WCY::update in 
>> /spider/perl/DXProtHandle.pm line: 1775
>> 1745074921^(trace) Stack (5): DXProt::DXProt::handle_73 in 
>> /spider/perl/DXProt.pm line: 466
>> 1745074921^(trace) Stack (6): DXChannel::DXProt::normal in 
>> /spider/perl/DXChannel.pm line: 746
>> 1745074921^(trace) Stack (7): DXChannel::DXChannel::process_one in 
>> /spider/perl/DXChannel.pm line: 239
>> 1745074921^(trace) Stack (8): main::DXChannel::rec in 
>> /spider/perl/cluster.pl <http://cluster.pl> line: 424
>> 1745074921^(trace) Stack (9): ExtMsg::main::__ANON__ in 
>> /spider/perl/ExtMsg.pm line: 120
>> 1745074921^(trace) Stack (10): Msg::ExtMsg::dequeue in 
>> /spider/perl/Msg.pm line: 500
>> 1745074921^(trace) Stack (11): ExtMsg::Msg::_rcv in 
>> /spider/perl/ExtMsg.pm line: 83
>> 1745074921^(trace) Stack (12): Msg::ExtMsg::_rcv in 
>> /spider/perl/Msg.pm line: 511
>> 1745074921^(*) DXSpider Ceasing
>> 1745074921^(*) DXQSL finished
>> 1745074921^(*) RBN:WRITE_CACHE size: 357.289KB time to write: 31 mS
>> 1745074921^(*) DXDupe finishing
>> 1745074921^(*) DXUser finished
>> 1745074921^(cluster) DXSpider v1.57 build 615 (git: mojo/9f7fb47f[r]) 
>> using perl v5.38.2 on linux ended
>> 1745074921^(*) bye bye everyone - bye bye
>> On 23 Mar 2025 at 23:08 +0000, djk via Dxspider-support 
>> <dxspider-support at tobit.co.uk>, wrote:
>>>
>>> There is standard limit of 1024 files open at once per process. You 
>>> can change this in a shell with 'ulimit -n 2048' (for example). 
>>> There is also way of changing it system wide in systemd (<spit>) but 
>>> you'll have to research that yourself or start the node in a shell 
>>> script like:
>>>
>>> #!/bin/sh
>>> ulimit -n 2048
>>> /spider/perl/cluster.pl
>>>
>>> Personally, 900+ users on a 4GB RPi x is going some, especially 
>>> considering power required to run some windows cluster software (and 
>>> then still not keeping).
>>>
>>> What does your 'top' say when you are running it at this sort of usage?
>>>
>>> Dirk G1TLH
>>>
>>> On 23/03/2025 18:54, Keith, G6NHU via Dxspider-support wrote:
>>>> I suppose this really is for Dirk.
>>>>
>>>> This has never happened before - I came into the shack with a 
>>>> freshly poured shackbeer and noticed my ssh session had closed so I 
>>>> logged back in and saw my uptime was just 21 minutes.
>>>>
>>>> Checking the debug log (attached as a .zip), this is what happened 
>>>> in the same timestamp with the actual error that caused the crash 
>>>> being at the end.
>>>>
>>>> My cluster is running on a Pi5 with 4Gb RAM and an external Samsung 
>>>> SSD.   I don’t know the exact number of connected users but when I 
>>>> logged back on, there were 938 so I’d imagine the number prior to 
>>>> the crash was around the same.   The node had been up for about a 
>>>> month.
>>>>
>>>> “Too many open files” ?
>>>>
>>>> 73 Keith
>>>>
>>>> 1742753087^###
>>>> 1742753087^### RINGBUFFER END 501 debug lines written
>>>> 1742753087^###
>>>> 1742753087^(trace) writing /spider/local_data/rbn_cache Too many 
>>>> open files
>>>> 1742753087^(trace) Stack (2): RBN::DXDebug::confess in 
>>>> /spider/perl/RBN.pm line: 926
>>>> 1742753087^(trace) Stack (3): RBN::RBN::write_cache in 
>>>> /spider/perl/RBN.pm line: 871
>>>> 1742753087^(trace) Stack (4): main::RBN::per_minute in 
>>>> /spider/perl/cluster.pl <http://cluster.pl> line: 892
>>>> 1742753087^(*) DXSpider Ceasing
>>>> 1742753087^(*) DXQSL finished
>>>> 1742753087^(*) RBN:WRITE_CACHE size: 423.804KB time to write: 26 mS
>>>> 1742753087^(*) DXDupe finishing
>>>> 1742753087^(*) DXUser finished
>>>> 1742753087^(cluster) DXSpider v1.57 build 568 (git: 
>>>> mojo/0920a333[r]) using perl v5.36.0 on linux ended
>>>> 1742753087^(*) bye bye everyone - bye bye
>>>> 1742753087^###
>>>> 1742753087^### RINGBUFFER START at line 0 (zero base)
>>>> 1742753087^###
>>>>
>>>> Then it repeats
>>>>
>>>> 1742753087^RING: 18:04:47^(trace) writing 
>>>> /spider/local_data/rbn_cache Too many open files
>>>> 1742753087^RING: 18:04:47^(trace) Stack (2): RBN::DXDebug::confess 
>>>> in /spider/perl/RBN.pm line: 926
>>>> 1742753087^RING: 18:04:47^(trace) Stack (3): RBN::RBN::write_cache 
>>>> in /spider/perl/RBN.pm line: 871
>>>> 1742753087^RING: 18:04:47^(trace) Stack (4): main::RBN::per_minute 
>>>> in /spider/perl/cluster.pl <http://cluster.pl> line: 892
>>>> 1742753087^RING: 18:04:47^(*) DXSpider Ceasing
>>>> 1742753087^RING: 18:04:47^(*) DXQSL finished
>>>> 1742753087^RING: 18:04:47^(*) RBN:WRITE_CACHE size: 423.804KB time 
>>>> to write: 26 mS
>>>> 1742753087^RING: 18:04:47^(*) DXDupe finishing
>>>> 1742753087^RING: 18:04:47^(*) DXUser finished
>>>> 1742753087^RING: 18:04:47^(cluster) DXSpider v1.57 build 568 (git: 
>>>> mojo/0920a333[r]) using perl v5.36.0 on linux ended
>>>> 1742753087^RING: 18:04:47^(*) bye bye everyone - bye bye
>>>> 1742753087^###
>>>> 1742753087^### RINGBUFFER END 501 debug lines written
>>>> 1742753087^###
>>>>
>>>> _______________________________________________
>>>> Dxspider-support mailing list
>>>> Dxspider-support at tobit.co.uk
>>>> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>>> _______________________________________________
>>> Dxspider-support mailing list
>>> Dxspider-support at tobit.co.uk
>>> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>> _______________________________________________
>> Dxspider-support mailing list
>> Dxspider-support at tobit.co.uk
>> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20250421/47bec9c8/attachment-0001.htm>


More information about the Dxspider-support mailing list