#1 2018-03-04 00:39:46

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Inception / London

Backtogeek, could there be some connectivity issue in London? It seems there are intermittent connection timeouts every couple of hours since Saturday. They last between about one and three minutes so far.

Last edited by user467268 (2018-03-04 00:45:47)

Offline

#2 2018-03-04 02:01:04

layfon
Trusted Member
Registered: 2017-03-31
Posts: 36

Re: Inception / London

London is on Clouvider's network, but didn't see any report on LET.
Can you do traceroute when connection times out?

Last edited by layfon (2018-03-04 02:01:51)

Offline

#3 2018-03-04 09:30:00

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

It is short and irregular outages. A trace might be a bit tricky as I notice it only retroactively.

It did behave for the past nine hours (last hiccup was at midnight GMT) though.

Offline

#4 2018-03-04 11:28:13

layfon
Trusted Member
Registered: 2017-03-31
Posts: 36

Re: Inception / London

Try mtr with reasonable interval under screen or tmux. When there's hiccup, some node's packet loss should increase more than others.

Offline

#5 2018-03-04 16:20:47

WSS
Trusted Member
Registered: 2016-12-22
Posts: 394

Re: Inception / London

Also, not sure how London is setup, but if your IPv4 NAT is fine, and IPv6 is barfing, that's because of Hurricane somewhere.  big_smile


RbyeR4Nm.png

Offline

#6 2018-03-05 03:52:04

layfon
Trusted Member
Registered: 2017-03-31
Posts: 36

Re: Inception / London

Just checked London LES and ipv6 is Clouvider's range.
I guess only locations without ipv6(like ioflood) use HE tunnel

Offline

#7 2018-03-07 18:43:51

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

The timeouts are still happening, by now I seem to have caught one in action ....

 5  be100-2.ldn-1-a9.uk.eu (213.251.130.121)  4.570 ms  4.745 ms  4.696 ms
 6  linx-lon2.thn2.peering.clouvider.net (195.66.239.14)  3.875 ms  3.895 ms *
 7  * * *
 8  h185-145-200-9.reverse.clouvider.net (185.145.200.9)  4.094 ms  4.045 ms  4.162 ms
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * h185-145-200-9.reverse.clouvider.net (185.145.200.9)  2582.173 ms !H  2582.113 ms !H

It seems the issue appears at the last hop, I cant tell though, if it takes 2.5 seconds to reach .9 which then reports an "unreachable" or if .9 is within normal parameters and the 2.5 seconds are a simple timeout to the next (destination) host.

No IPv6 involved, only v4

Last edited by user467268 (2018-03-08 06:36:00)

Offline

#8 2018-03-08 06:38:36

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

Another one from 3:51am GMT (three hours ago)

 5  be100-2.ldn-1-a9.uk.eu (213.251.130.121)  4.478 ms  4.634 ms  4.801 ms
 6  linx-lon2.thn2.peering.clouvider.net (195.66.239.14)  3.985 ms  3.896 ms *
 7  * * *
 8  h185-145-200-9.reverse.clouvider.net (185.145.200.9)  4.083 ms  4.028 ms  4.225 ms
 9  h185-145-200-9.reverse.clouvider.net (185.145.200.9)  135.744 ms !H  135.727 ms !H  135.508 ms !H

Most recently these connectivity "drops" are mostly one-offs and seem to happen mostly at night (GMT).

Offline

#9 2018-03-08 19:10:14

mikho
Low End Mod
From: Hell and gore == Sweden
Registered: 2013-03-02
Posts: 1,581
Website

Re: Inception / London

Is it every day or once in a week type of thing?

Offline

#10 2018-03-08 20:08:44

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

mikho wrote:

Is it every day or once in a week type of thing?

Every day. It started about a week ago (plus/minus) and initially with brief connection drops between four to ten minutes. In the past few days that went (mostly) down to intermittent one-off (not more than a minute) drops, and those mostly at night (GMT). Though there were two more this afternoon (GMT).

Is there maybe anything related to such issues visible on the machine itself?

Offline

#11 2018-03-08 20:58:21

Backtogeek
Low End Boss
From: ~/
Registered: 2013-02-13
Posts: 3,821
Website

Re: Inception / London

I have seen a few brief load spikes, but not network drops, I will keep an eye on it, times are useful as they happen with +-5 minute accuracy.

does it impact v6 as well or just v4?


https://upto32.com retro gaming and nostalgia forum that does not take itself to seriously smile

Offline

#12 2018-03-09 17:46:10

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

Not using IPv6, so I am afraid I cant tell.

As for times, the last ones were at Mar 8 11:51pm, Mar 9 4:46am, and Mar 9 12:17pm - all times GMT.

Edit: Another one just now at 6:49pm
Edit 2: One more at 12:09am GMT (March 10)
Edit 3: Also between 10:56 and 10:57 am
Edit 4: Number four today at 12:02pm

Last edited by user467268 (2018-03-10 14:36:52)

Offline

#13 2018-03-10 14:41:23

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

There is one thing I just noticed .... it appears the VM is sometimes (doesnt always seem to be the case though) "frozen" at the time when the connectivity drops. That would certainly explain why the machine is not reachable. Anything in your logs in that regard?

Offline

#14 2018-03-10 15:29:35

Backtogeek
Low End Boss
From: ~/
Registered: 2013-02-13
Posts: 3,821
Website

Re: Inception / London

yep, I caught one of them, I think its isolated to this machine, its going to be a nightmare to diagnose so I think instead I will just move it to one of the new bits of pure NVMe kit going in to production within the next week, if you can live with these brief drops for a little while I will get this node on super speed.


https://upto32.com retro gaming and nostalgia forum that does not take itself to seriously smile

Offline

#15 2018-03-10 15:39:39

Bstrobl
Trusted Member
Registered: 2016-06-24
Posts: 57

Re: Inception / London

Backtogeek wrote:

yep, I caught one of them, I think its isolated to this machine, its going to be a nightmare to diagnose so I think instead I will just move it to one of the new bits of pure NVMe kit going in to production within the next week, if you can live with these brief drops for a little while I will get this node on super speed.

Ohh NVMe, fancy big_smile

Offline

#16 2018-03-10 17:14:23

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

I am already glad that I am not imagining things big_smile Thanks for the confirmation.

If this can be fixed by next week that is more than fine with me.

I guess you dont need now anymore reports of that sort, right? Just now there was another drop and this time it seemed to have lasted a bit longer. Also, the machine was actually reachable and TCP connects worked alright but TCP communication did not, probably because there was a load of up to 5.8. That started at 5:00pm GMT and lasted until 5:10pm GMT.

Last edited by user467268 (2018-03-10 17:15:10)

Offline

#17 2018-03-10 21:12:50

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

Yep, it does seem to be load related. There was another drop at 8:13pm (respectively 8:11pm on the machine - it appears to be two minutes slow) and there was another load spike around that time (far from 5 this time however).

Offline

#18 2018-03-12 15:12:10

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

user467268 wrote:

Yep, it does seem to be load related.

I am confident enough at this point to confirm that now. Load goes up, connection goes down.

Last edited by user467268 (2018-03-12 15:12:30)

Offline

#19 2018-03-14 00:03:22

WSS
Trusted Member
Registered: 2016-12-22
Posts: 394

Re: Inception / London

user467268 wrote:
user467268 wrote:

Yep, it does seem to be load related.

I am confident enough at this point to confirm that now. Load goes up, connection goes down.

Story of my dating life.


RbyeR4Nm.png

Offline

#20 2018-03-16 17:10:25

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

Backtogeek wrote:

I will just move it to one of the new bits of pure NVMe kit going in to production within the next week

Any news?

Offline

#21 2018-03-17 09:52:50

Backtogeek
Low End Boss
From: ~/
Registered: 2013-02-13
Posts: 3,821
Website

Re: Inception / London

Sorry been out of action and working on minimum effort for the last week, I have been seriously unwell and working on minimum effort only.


https://upto32.com retro gaming and nostalgia forum that does not take itself to seriously smile

Offline

#22 2018-03-26 15:03:34

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

The issue is unfortuately still present.

Someone or something is regularly making the load spike and at the point the connectivity goes down.

Offline

#23 2018-03-26 15:04:57

Backtogeek
Low End Boss
From: ~/
Registered: 2013-02-13
Posts: 3,821
Website

Re: Inception / London

yep, it should be significantly less right now, I have ram running an extensive burn in test at the moment, when it is complete I will be scheduling the maintenance and moving the UK node.

Keep in mind this is a matter of seconds per day, it really should not be impacting anyone significantly.


https://upto32.com retro gaming and nostalgia forum that does not take itself to seriously smile

Offline

#24 2018-03-26 17:56:47

user467268
Trusted Member
Registered: 2018-02-25
Posts: 40

Re: Inception / London

So it definitely is a hardware issue, not a misbehaving user? Any idea what could be the reason for such a behaviour? You seem to suspect the memory.

Do you have any date in mind for the service migration?

Thanks

Offline

#25 2018-03-26 18:56:47

Backtogeek
Low End Boss
From: ~/
Registered: 2013-02-13
Posts: 3,821
Website

Re: Inception / London

Kind of, its a bit more complicated then that, more to do with the LES node getting low priority (it runs on a big nested KVM), its literally seconds but the impact on the LES node is high wait on disk and cpu time.

I have a new chassis ready which is a dual E5 with 256GB Ram and ssd cache, the LES node will be moved to that, I had initially considered moving it to an E3 NVMe node but disk space is a premium on those I could not justify it, then I was going to move it to another E5 node but the SSD cache is running at risk waiting for a maintenance window so i did not want to complicate that.

Hopefully mid next week.


https://upto32.com retro gaming and nostalgia forum that does not take itself to seriously smile

Offline

Board footer

Powered by FluxBB