[Dxspider-support] New mojo version

Sat Sep 21 16:45:36 BST 2024

Can I see a TCPDUMP of the full sequence from user login to user disconnect?

The command is tcpdump host x.x.x.x -An

x.x.x.x is the ip address of a user with the issue.

Andrea

-->

Il giorno sab 21 set 2024 alle ore 16:54 Kin EA3CV <ea3cv at cronux.net> ha
scritto:

> Andrea, the RSTs correspond to HamClock users that are failing. That was
> the first thing I checked, timestamp and IP.
> As Keith said, it crashed after the update, but when the working build was
> restored, it continued to crash. I don't think the problem is in dxspider
> if we have gone back to the backup that worked and no longer works.
> The dxspider traces do not show anything abnormal except disconnections.
> It is necessary to reproduce the behavior with a client under observation.
>
> Kin
>
>
> ------------------------------
> *De:* IZ2LSC <iz2lsc.andrea at gmail.com>
> *Enviado:* sábado, septiembre 21, 2024 4:41:34 p. m.
> *Para:* Keith Maton <g6nhu at me.com>
> *CC:* Kin EA3CV <ea3cv at cronux.net>; The DXSpider Support list <
> dxspider-support at tobit.co.uk>
> *Asunto:* Re: [Dxspider-support] New mojo version
>
> Keith,
> from what was shared in this thread I can see the reset is received by the
> dxspider, so someone else has generated it.
> Before going for any conclusion, I want to be sure that the tcpdump that
> was shared is really about a user that is having the problem.
> This is why I asked to get the tcpdump for a user IP as long as the
> dxspider debug log for the same user captured at the same time.
>
> My hamclock is able to connect to your cluster without any issue and I
> tried several disconnect/connect sequences.
>
> Andrea
>
>
>
>
>
> -->
>
>
> Il giorno sab 21 set 2024 alle ore 16:30 Keith Maton <g6nhu at me.com> ha
> scritto:
>
>> So what’s the current feeling, is the disconnect coming from HamClock or
>> the DXSpider?
>>
>> I don’t think we can send attachments to this list so here’s a link
>> <https://g6nhu.co.uk/users-week.png> to the mrtg users graph.
>>
>> You’ll see it stops at 20:30z on Wednesday.  That’s because it all went
>> wrong when I did the update on Thursday afternoon and then a couple of
>> hours later I restored the 536 backup that was taken the previous evening.
>> The gap is from the time of the backup to when I restored.
>>
>> I’ve gone back to exactly how it was before the update.   I talk to the
>> HamClock dev daily and there are multiple different versions of HamClock
>> all unable to connect.
>>
>> I simply don’t know where to go from here, especially as I built a new
>> node on a different pi this morning and the same thing happens.
>>
>> 73 Keith.
>>
>>
>>
>> On 21 Sep 2024, at 14:58, Kin EA3CV <ea3cv at cronux.net> wrote:
>>
>> Yes, there is clearly something HamClock doesn't like. I haven't looked
>> at a HamClock user that works but the ones that fail don't terminate the
>> socket with FIN.
>> I had thought about setting up a client in a container, but if you try
>> it, you'll let us know.
>>
>> Kin
>>
>>
>>
>> ------------------------------
>> *De:* IZ2LSC <iz2lsc.andrea at gmail.com>
>> *Enviado:* sábado, septiembre 21, 2024 3:23:04 p. m.
>> *Para:* Kin <ea3cv at cronux.net>
>> *CC:* The DXSpider Support list <dxspider-support at tobit.co.uk>; Keith
>> Maton <g6nhu at me.com>
>> *Asunto:* Re: [Dxspider-support] New mojo version
>>
>> Kin,
>> the netstat looks fine, I can see 87 sessions established.
>> But from the TCP dump you attached I see a  lot of RST (reset)  coming
>> from client side, not from cluster.
>>
>> Just to give you an example this is what happen when is the cluster
>> disconnecting a user (192.168.1130 is the cluster):
>>
>> 15:15:53.503505 IP 192.168.1130.7373 > 192.168.1.111.52076: Flags [F.],
>> seq 4172, ack 8, win 227, options [nop,nop,TS val 2856414384 ecr
>> 3254272443], length 0
>> 15:15:53.504215 IP 192.168.1.111.52076 > 192.168.1.130.7373: Flags [F.],
>> seq 8, ack 4173, win 501, options [nop,nop,TS val 3254273942 ecr
>> 2856414384], length 0
>> 15:15:53.504340 IP 192.168.1.130.7373 > 192.168.1.111.52076: Flags [],
>> ack 9, win 227, options [nop,nop,TS val 2856414385 ecr 3254273942], length 0
>>
>> So the cluster is the first sending the Fin
>>
>> Can you try to follow a specific flow, correlating the IP address you see
>> in the debug log of dxspider with the ip address you find on the tcpdump?
>> I mean the sessions from the beginning to end.
>>
>> Meantime I'll set up a hamclock and test it with your cluster.
>>
>>
>> Andrea
>>
>>
>>
>> -->
>>
>>
>> Il giorno sab 21 set 2024 alle ore 14:43 Kin <ea3cv at cronux.net> ha
>> scritto:
>>
>>> I think it is clear that the client is being logged out:
>>>
>>>
>>>
>>> 7160  33.786132 216.189.132.128 → 192.168.1.8  TCP 72 56774 → 7300
>>> [RST, ACK] Seq=10 Ack=8 Win=32128 Len=0 TSval=1580625328 TSecr=201034149
>>>
>>> 7168  33.896513 216.189.132.128 → 192.168.1.8  TCP 66 56774 → 7300
>>> [RST] Seq=1 Win=0 Len=0
>>>
>>> 7169  33.896698 216.189.132.128 → 192.168.1.8  TCP 66 56774 → 7300
>>> [RST] Seq=7 Win=0 Len=0
>>>
>>> 7170  33.906293 216.189.132.128 → 192.168.1.8  TCP 66 56774 → 7300
>>> [RST] Seq=10 Win=0 Len=0
>>>
>>> 7178  34.340220 171.100.240.62 → 192.168.1.8  TCP 66 63285 → 7300 [RST,
>>> ACK] Seq=1 Ack=1 Win=0 Len=0
>>>
>>> 8448  39.243542 209.193.104.69 → 192.168.1.8  TCP 66 40996 → 7300 [RST]
>>> Seq=1 Win=0 Len=0
>>>
>>> 9532  50.372818 72.14.148.41 → 192.168.1.8  TCP 72 64276 → 7300 [RST,
>>> ACK] Seq=2 Ack=2 Win=251 Len=0 TSval=3068700830 TSecr=1953125033
>>>
>>> 19600  91.809075 74.132.9147 → 192.168.1.8  TCP 72 36452 → 7300 [RST,
>>> ACK] Seq=13 Ack=8 Win=64256 Len=0 TSval=2598464619 TSecr=4079102387
>>>
>>> 19749  91.934857 74.132.91.47 → 192.168.1.8  TCP 66 36452 → 7300 [RST]
>>> Seq=10 Win=0 Len=0
>>>
>>> 19750  91.937074 74.132.91.47 → 192.168.1.8  TCP 66 36452 → 7300 [RST]
>>> Seq=13 Win=0 Len=0
>>>
>>> 21439  97.245904 67.190.210.166 → 192.168.1.8  TCP 66 57758 → 7300
>>> [RST] Seq=1 Win=0 Len=0
>>>
>>> 23946 104.730291 86.150.197.182 → 192.168.1.8  TCP 66 49319 → 7300
>>> [RST, ACK] Seq=1 Ack=9 Win=0 Len=0
>>>
>>> 23947 104.730291 86.150.197.182 → 192.168.1.8  TCP 66 49316 → 7300
>>> [RST, ACK] Seq=1 Ack=2 Win=0 Len=0
>>>
>>> 24595 106.702456 74.132.91.47 → 192.168.1.8  TCP 72 58432 → 7300 [RST,
>>> ACK] Seq=13 Ack=8 Win=64256 Len=0 TSval=2598479515 TSecr=4079117277
>>>
>>> 24614 106.848106 74.132.91.47 → 192.168.1.8  TCP 66 58432 → 7300 [RST]
>>> Seq=10 Win=0 Len=0
>>>
>>> 24618 106.919363 74.132.91.47 → 192.168.1.8  TCP 66 58432 → 7300 [RST]
>>> Seq=13 Win=0 Len=0
>>>
>>> 25499 114.246740 67.190.210.166 → 192.168.1.8  TCP 66 41444 → 7300
>>> [RST] Seq=1 Win=0 Len=0
>>>
>>> 26057 118.535648 72.14.148.41 → 192.168.1.8  TCP 72 22727 → 7300 [RST,
>>> ACK] Seq=1 Ack=9 Win=64256 Len=0 TSval=3068768993 TSecr=1953190036
>>>
>>> 27133 121.696803 74.132.91.47 → 192.168.1.8  TCP 72 33884 → 7300 [RST,
>>> ACK] Seq=13 Ack=8 Win=64256 Len=0 TSval=2598494508 TSecr=4079132270
>>>
>>> 27149 121.815184 74.132.91.47 → 192.168.1.8  TCP 66 33884 → 7300 [RST]
>>> Seq=10 Win=0 Len=0
>>>
>>> 27150 121.815249 74.132.91.47 → 192.168.1.8  TCP 66 33884 → 7300 [RST]
>>> Seq=10 Win=0 Len=0
>>>
>>> 27151 121.815250 74.132.9147 → 192.168.1.8  TCP 66 33884 → 7300 [RST]
>>> Seq=13 Win=0 Len=0
>>>
>>> 29651 131.245565 67.190.210.166 → 192.168.1.8  TCP 66 56690 → 7300
>>> [RST] Seq=1 Win=0 Len=0
>>>
>>> 29689 132.322988 171.100.240.62 → 192.1681.8  TCP 66 63313 → 7300 [RST,
>>> ACK] Seq=1 Ack=9 Win=0 Len=0
>>>
>>> 29690 132.323664 171.100.240.62 → 192.168.1.8  TCP 66 63298 → 7300
>>> [RST, ACK] Seq=1 Ack=2 Win=0 Len=0
>>>
>>> 30075 136.719069 74.132.91.47 → 192.168.1.8  TCP 72 51106 → 7300 [RST,
>>> ACK] Seq=13 Ack=8 Win=64256 Len=0 TSval=2598509531 TSecr=4079147277
>>>
>>> 30094 136.842612 74.132.91.47 → 192.168.1.8  TCP 66 51106 → 7300 [RST]
>>> Seq=10 Win=0 Len=0
>>>
>>> 30512 139.246966 74.132.91.47 → 192.168.1.8  TCP 66 46146 → 7300 [RST]
>>> Seq=1 Win=0 Len=0
>>>
>>> 30730 141.916039 72181.212.51 → 192.168.1.8  TCP 72 52404 → 7300 [RST,
>>> ACK] Seq=10 Ack=8 Win=32128 Len=0 TSval=10881996 TSecr=4118248549
>>>
>>> 32092 148.245539 67.190.210.166 → 192.168.1.8  TCP 66 60236 → 7300
>>> [RST] Seq=1 Win=0 Len=0
>>>
>>> 33326 151.728538 74.132.91.47 → 192.168.1.8  TCP 72 47306 → 7300 [RST,
>>> ACK] Seq=13 Ack=8 Win=64256 Len=0 TSval=2598524532 TSecr=4079162306
>>>
>>> 33340 151.867383 74.132.91.47 → 192.1681.8  TCP 66 47306 → 7300 [RST]
>>> Seq=1 Win=0 Len=0
>>>
>>> 33341 151.867471 74.132.91.47 → 192.168.1.8  TCP 66 47306 → 7300 [RST]
>>> Seq=10 Win=0 Len=0
>>>
>>> 33342 151.868904 74.132.91.47 → 192.168.1.8  TCP 66 47306 → 7300 [RST]
>>> Seq=13 Win=0 Len=0
>>>
>>> 34141 156.245366 67.190.210.166 → 192.168.1.8  TCP 66 59968 → 7300
>>> [RST] Seq=1 Win=0 Len=0
>>>
>>> 36145 166.704908 74.132.91.47 → 192.168.1.8  TCP 72 55558 → 7300 [RST,
>>> ACK] Seq=13 Ack=8 Win=64256 Len=0 TSval=2598539512 TSecr=4079177284
>>>
>>> 36150 166844112 74.132.91.47 → 192.168.1.8  TCP 66 55558 → 7300 [RST]
>>> Seq=1 Win=0 Len=0
>>>
>>> 36151 166.844112 74.132.91.47 → 192.168.1.8  TCP 66 55558 → 7300 [RST]
>>> Seq=13 Win=0 Len=0
>>>
>>> 37488 173.246799 67.190.210.166 → 192.168.1.8  TCP 66 55428 → 7300
>>> [RST] Seq=1 Win=0 Len=0
>>>
>>> 37877 176.782641 72.14.148.41 → 192.168.1.8  TCP 72 47454 → 7300 [RST,
>>> ACK] Seq=1 Ack=9 Win=64256 Len=0 TSval=3068827240 TSecr=1953255335
>>>
>>> 38468 182.245044 212.251.236.77 → 192.168.1.8  TCP 66 25610 → 7300
>>> [RST] Seq=1 Win=0 Len=0
>>>
>>> 40367 190.261692 67.190.210.166 → 192.168.1.8  TCP 66 41508 → 7300
>>> [RST] Seq=1 Win=0 Len=0
>>>
>>>
>>>
>>> Kin EA3CV
>>>
>>>
>>>
>>>
>>>
>>> *De:* IZ2LSC <iz2lsc.andrea at gmail.com <iz2lsc.andrea at gmailcom>>
>>> *Enviado el:* sábado, 21 de septiembre de 2024 13:10
>>> *Para:* The DXSpider Support list <dxspider-support at tobit.co.uk>
>>> *CC:* Kin <ea3cv at cronux.net>; Keith Maton <g6nhu at me.com>
>>> *Asunto:* Re: [Dxspider-support] New mojo version
>>>
>>>
>>>
>>> Hi,
>>>
>>> Any change on the router that is doing the port forward?
>>>
>>> Maybe there is ddos protection on it that kick in.
>>>
>>>
>>>
>>> Are we sure that the disconnect is coming from dxspider and not from the
>>> router?
>>>
>>>
>>>
>>> I think we have to take a tcpdump to look at the tcp flow to understand
>>> from where the TCP RST or FIN is coming from.
>>>
>>>
>>>
>>> If you need help taking the tcpdump we can setup a call with screen
>>> sharing and I can guide you.
>>>
>>>
>>>
>>> 73
>>>
>>> Andrea, iz2lsc
>>>
>>>
>>>
>>>
>>>
>>> -->
>>>
>>>
>>>
>>>
>>>
>>> Il giorno sab 21 set 2024 alle ore 13:01 Kin via Dxspider-support <
>>> dxspider-support at tobit.co.uk> ha scritto:
>>>
>>> Hi,
>>>
>>> I have been trying to help Keith with his problem, and after analysing
>>> everything I can think of, I can't see the reason for the disconnection
>>> with
>>> the traces we have.
>>>
>>> This is basically what is happening to him:
>>>
>>> 1726911492^(connect) ExtMsg accept 165:192.168.1.208 from
>>> 68.117.200.55:58828
>>> 1726911492^(connect) ExtMsg connect 165: login:
>>> 1726911492^(connect) connect 165: timeout set to 60
>>> 1726911492^(connect) connect 165: AE5DW
>>> 1726911492^(state) AE5DW channel func  state 0 -> prompt
>>> 1726911492^(DXCommand) AE5DW connected from 68.117.200.55 cols 80
>>> 1726911492^(progress) CMD: 'unset/beep ' by AE5DW ip: 68.117.200.55 0mS
>>> 1726911492^(progress) CMD: 'show/cluster ' by AE5DW ip: 68.117.200.55 0mS
>>> 1726911492^(DXCommand) AE5DW disconnected
>>>
>>> But with the rest of the users it is not failing.
>>>
>>> Kin EA3CV
>>>
>>>
>>> -----Mensaje original-----
>>> De: Dxspider-support <dxspider-support-bounces at tobit.co.uk> En nombre de
>>> Keith Maton via Dxspider-support
>>> Enviado el: sábado, 21 de septiembre de 2024 12:30
>>> Para: The DXSpider Support list <dxspider-support at tobit.co.uk>
>>> CC: Keith Maton <g6nhu at me.com>
>>> Asunto: Re: [Dxspider-support] New mojo version
>>>
>>> This morning I took a fresh Pi, a new SSD and built a new node from
>>> scratch.
>>> I copied over the user file and imported it.  I also copied the spots
>>> directory so no history would be lost and the filters directory so my
>>> users
>>> would still have their filters.
>>>
>>> I also copied my startup file, my connect scripts and my crontab.
>>>
>>> I hashed out pretty much everything in the crontab.  I started the node,
>>> disconnected some links from the old one and manually started them on the
>>> new one to confirm I could connect and get data in.
>>>
>>> Then I stopped the old node and changed the port forwarding in my router
>>> to
>>> the new one.
>>>
>>> It’s no different. I’m still getting exactly the same thing.  Some (but
>>> not
>>> all) HamClocks are connecting and then immediately being disconnected
>>> before
>>> they can send any commands.  I’m 99.9% sure the disconnect is coming from
>>> the dxspider and not the HamClock because HamClock tracks whether the
>>> disconnect is coming from local or remote.
>>>
>>> There’s no pattern to this, it doesn’t seem to be HamClock version
>>> specific
>>> as I sent a sample to the developer who checked and saw multiple
>>> different
>>> versions.
>>>
>>> The HamClock connects
>>> I see the connection in the debug log and then immediately, after two
>>> commands are forced by the node (unset/beep and show/cluster), the node
>>> disconnects.
>>> This repeats ten times then the HamClock stops connecting for one hour
>>> because it’s reached its hard limit of ten disconnects/hour.  It only
>>> tracks
>>> remote disconnections towards this limit.
>>>
>>> But the crazy and unexplained thing is that when I reverted back to build
>>> 536 by restoring a backup, the same thing is still happening.  Nothing
>>> has
>>> changed on my network as the connections are still making it to the node.
>>>
>>> I’m really lost here.  I feel bad because there are well over 200 people
>>> who
>>> won’t have been able to connect since Thursday afternoon.  They’ve
>>> probably
>>> gone over to other nodes, which is fine but it doesn’t resolve the
>>> problem
>>> I’ve got here and what could happen to me could happen to anyone.   I’ve
>>> gone out of my way recently to push my node as the best for HamClocks
>>> (because I know a lot of sysops weren’t happy with it) and now it’s
>>> utterly
>>> rubbish for them.
>>>
>>> I owe it to my users to try and resolve this but at the moment, I feel as
>>> though after eight years of running a node (which I appreciate is a lot
>>> less
>>> than many), I just want to switch the damn thing off.  I’m not going to,
>>> because I don’t like things to beat me but it’s very, very frustrating.
>>>
>>> 73 Keith
>>>
>>>
>>>
>>>
>>> > On 21 Sep 2024, at 04:25, Rene Olsen via Dxspider-support
>>> <dxspider-support at tobit.co.uk> wrote:
>>> >
>>> > Hi.
>>> >
>>> > Still waiting for a replay as to why G6NHU-2 lost like 75% of his
>>> > users before I do anything with the new version.
>>> >
>>> > So, will at least wait until next week. Like W1NR, I never update just
>>> > before or during a weekend.
>>> >
>>> > Vy 73 de René / OZ1LQH
>>> >
>>> > On 20 Sep 2024 at 17:44, Kin via Dxspider-support wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> The new build is working very well for me.
>>> >> Only 60 out of 318 dxspider have been updated.
>>> >> Cheer up, it's been in testing for a while and it's stable.
>>> >>
>>> >> 73 de Kin EA3CV
>>> >>
>>> >>
>>> >> De: Dxspider-support <dxspider-support-bounces at tobit.co.uk> En
>>> nombre
>>> >> de Dirk Koopman via Dxspider-support Enviado el: jueves, 19 de
>>> >> septiembre de 2024 15:24
>>> >> Para: Dxspider-Support <dxspider-support at dxcluster.org>
>>> >> CC: Dirk Koopman <djk at tobit.co.uk>
>>> >> Asunto: [Dxspider-support] New mojo version
>>> >>
>>> >> There is a new mojo version which has been under test by a few brave
>>> sysops and they have determined that it is stable. Please look at the
>>> Changes file for the list of issues dealt with.
>>> >>
>>> >> One of the issues that has become apparent is the random lock status
>>> (historically) granted to new nodes that appear on the network. For some
>>> reason they defaulting to "unlocked". I don't understand why this has
>>> suddenly become a problem AGAIN, but it does seem to affect longer
>>> running
>>> nodes more than newer ones.
>>> >>
>>> >> This release is an attempt to fix this. It will lock all nodes that
>>> are
>>> not specifically unlocked via explicit unset/lock or set/spider type
>>> commands. Unfortunately, previous attempts to deal with this may have got
>>> this all confused and it *MAY* (and I stress this) mean that a (very)
>>> few of
>>> your older node partners *MIGHT* get locked out. If this happens then
>>> simply
>>> unset/lock or set/spider any of these nodes manually.
>>> >>
>>> >> There is new spot deduping code which seems to reduce the number of
>>> dupes, but since I have not been able to reproduce this further than
>>> making
>>> sure that nodes that issue multiple dupe spots with the same sequence
>>> number
>>> don't cause dupes.
>>> >>
>>> >> 73 Dirk G1TLH
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Dxspider-support mailing list
>>> > Dxspider-support at tobit.co.uk
>>> > https://mailmantobit.co.uk/mailman/listinfo/dxspider-support
>>> <https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support>
>>>
>>>
>>> _______________________________________________
>>> Dxspider-support mailing list
>>> Dxspider-support at tobit.co.uk
>>> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>>>
>>>
>>> _______________________________________________
>>> Dxspider-support mailing list
>>> Dxspider-support at tobit.co.uk
>>> https://mailman.tobit.co.uk/mailman/listinfo/dxspider-support
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20240921/3f8557f2/attachment-0001.htm>