[Dxspider-support] Latest source of utter frustration

Dirk Koopman djk at tobit.co.uk
Tue Apr 14 20:43:04 CEST 2020


Here is a case for using the mojo branch. AFAIK there is no programmed 
in 1024 file limit and it won't use normal select (unless you forget to 
install EV) so FD_SIZE isn't relevant. You might well still have to 
increase the maximum number of files available to a process. On my 
syslem ' ulimit -n 2048' works. But it depends on the hard limits set by 
the operating system. In Mike's case I think the hard limit is 4096. 
You'll need to modify the start up for cluster.pl to do the ulimit -n. A 
standard place is in /user/default/dxspider, or you could start the node 
with 'ulimit -n 2048; /spider/perl/cluster.pl > /dev/null >2&1'. YMMV.

Mike is the *only* person I know that seems to be able to run a real 
300+ user node on the "standard" master branch. Everybody else 
approaching that (that I know about) is on the mojo branch. If one is 
serious about running "super nodes" then the mojo branch is the way to 
go because it uses the Mojolicious framewark + EV + epoll(2) under the hood.

Dirk G1TLH

On 14/04/2020 17:10, Joaquin . via Dxspider-support wrote:
> Hi,
>
> I have mounted 3 VM:
> The first with version 1.55 0.204.
>             With 4 GB of RAM 1 CPU
>             Ubuntu 18.04.3 LTS
> The second with version 1.57 148.
>              With 4 GB of RAM 1 CPU
>              Debian GNU/Linux 10
> The third was the host from which the telnet requests were made.
>              With 4 GB of RAM 1 CPU
>              Debian GNU/Linux 10
>
> With the sysctl command I have increased the values of the variables 
> net.core
> and net.ipv4 to give more capacity to the OS, on the 3 machines.
>
> None of the tests has exceeded 1008 established client sockets.
>
> root at ea3cv_cluster:~# lsof -i tcp
> ...
> ...
> perl      8872           sysop 1020u  IPv4 190250      0t0  TCP 
> ea3cv_cluster:7300->192.168.1.34:49984 <http://192.168.1.34:49984> 
> (ESTABLISHED)
> perl      8872           sysop 1021u  IPv4 190251      0t0  TCP 
> ea3cv_cluster:7300->192.168.1.34:49982 <http://192.168.1.34:49982> 
> (ESTABLISHED)
> perl      8872           sysop 1022u  IPv4 190252      0t0  TCP 
> ea3cv_cluster:7300->192.168.1.34:49980 <http://192.168.1.34:49980> 
> (ESTABLISHED)
> perl      8872           sysop *1023u*  IPv4 190253    0t0  TCP 
> ea3cv_cluster:7300->192.168.1.34:49978 <http://192.168.1.34:49978> 
> (ESTABLISHED)
>
> root at ea3cv_cluster:~# ss -s
> Total: 2511 (kernel 4418)
> TCP:   1017 (estab *1008*, closed 0, orphaned 0, synrecv 0, timewait 
> 0/0), ports 0
>
> Transport Total     IP        IPv6
> *         4418      -         -
> RAW       0         0         0
> UDP       3         3         0
> TCP       1017      1017      0
> INET      1020      1020      0
> FRAG      0         0         0
>
> sysop at ea3cv-cluster3:~/spider/kin$ ulimit -n
> *5000*
>
>
> It has noticed a higher speed when going from 2 GB to 4 GB of RAM, but 
> nothing more.
> The swap memory is not occupied.
>
> sysop at ea3cv-cluster3:~/spider/kin$ uptime
>  11:32:29 up  2:04,  1 user,  load average: 1,11, 0,50, 0,18
>
>   Users: *1008 *<--- 1007 external sockets + 1 console.pl 
> <http://console.pl>
>   Uptime: 0 00:58
>   Load:   14.5 %       <--- Only cluster.pl <http://cluster.pl> & 
> console.pl <http://console.pl>
>   Mem:    73.7 %
>
> sysop at ea3cv-cluster3:~/spider/kin$ free
>               total        used        free      shared  buff/cache   
> available
> Mem:        4015844     1448060       158784    20520 2409000       
> 2271352
> Swap:       2097148 *268 *2246880
>
>
> When the socket limit was reached, if requests kept coming to the server,
> the console.pl <http://console.pl> would crash and all sockets would 
> be lost.
>
> I have found a possible cause for this limitation: *FD_SETSIZE*
>
> FD_SETSIZE is normally 1024, so file descriptors over 1024 are not 
> supported in general.
> You can fiddle with the FD_SETSIZE include sizes as you have done but 
> making changes
> like this might impact other programs too which aim to be POSIX compliant.
> ...
> /usr/include/x86_64-linux-gnu/bits/typesizes.h:#define __FD_SETSIZE 1024
> /usr/include/linux/posix_types.h:#define __FD_SETSIZE 1024
> ...
>
> https://access.redhat.com/solutions/488623
>
> After many tests I confirm that more than 1024 sockets cannot be 
> established with this configuration.
> So far the tests, let's see what the others say.
>
> Kin, EA3CV
>
> El mar., 14 abr. 2020 a las 11:47, Dirk Koopman via Dxspider-support 
> (<dxspider-support at tobit.co.uk <mailto:dxspider-support at tobit.co.uk>>) 
> escribió:
>
>     Firstly
>
>     SIGXCPU
>     The SIGXCPU signal is sent to a process when it has used up the
>     CPU for a duration that exceeds a certain predetermined
>     user-settable value.[16] The arrival of a SIGXCPU signal provides
>     the receiving process a chance to quickly save any intermediate
>     results and to exit gracefully, before it is terminated by the
>     operating system using the SIGKILL signal.
>
>     [16] refers to the information on getrlimit and setrlimit. Reading
>     the manual says:
>
>     RLIMIT_CPU
>     This is the maximum amount of CPU time, in seconds, used by a
>     process. If this limit is exceeded, SIGXCPU shall be generated for
>     the process. If the process is catching or ignoring SIGXCPU, or
>     all threads belonging to that process are blocking SIGXCPU, the
>     behavior is unspecified.
>
>     This parameter can be altered. But I am wondering whether you are
>     hitting the limits of the vCPU that is set in the machine running
>     this VM?
>
>     Also there is a default limit on the number of sockets a process
>     may have of 1024. Now, in the old days (and on Windows boxes with
>     default limits of 64 (!)), it would just refuse to accept new
>     ones. It (now) *may* be that this causes this signal. In any case
>     you should increase the maximum no of sockets the process will
>     take. Do a "ulimit -n" and you see the default is still 1024. You
>     will need to increase the limit.
>
>     You should probably also consider this StackOverflow answer:
>     https://stackoverflow.com/questions/410616/increasing-the-maximum-number-of-tcp-ip-connections-in-linux
>
>     Dirk
>
>
>     On 14/04/2020 01:29, Michael Carper, Ph.D. via Dxspider-support wrote:
>>     As of a bit more than a week ago, at about the same time each day
>>     (with max users connected), the node abruptly restarts. This is
>>     totally frustrating.
>>
>>     image.png
>>     I've checked all the crontabs... no events scheduled for that time.
>>
>>     I look in the linux logfile and see this...
>>
>>     # cat messages
>>     Apr 12 15:21:58 wa9pie-2b init: dxspider main process (28416)
>>     terminated with status 24
>>     Apr 12 15:21:58 wa9pie-2b init: dxspider main process ended,
>>     respawning
>>     Apr 13 16:14:45 wa9pie-2b init: dxspider main process (18493)
>>     terminated with status 24
>>     Apr 13 16:14:45 wa9pie-2b init: dxspider main process ended,
>>     respawning
>>
>>     What gives??
>>
>>     From what I can see, "status 24" is "stop issued from terminal."
>>
>>     Frustrated.
>>
>>     Mike, WA9PIE, VK4EIE
>>
>>     _______________________________________________
>>     Dxspider-support mailing list
>>     Dxspider-support at tobit.co.uk  <mailto:Dxspider-support at tobit.co.uk>
>>     https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
>
>     _______________________________________________
>     Dxspider-support mailing list
>     Dxspider-support at tobit.co.uk <mailto:Dxspider-support at tobit.co.uk>
>     https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>
>
> _______________________________________________
> Dxspider-support mailing list
> Dxspider-support at tobit.co.uk
> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20200414/05bb170f/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 42439 bytes
Desc: not available
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20200414/05bb170f/attachment-0001.png>


More information about the Dxspider-support mailing list