Majority of channels go offline and never come back....until reboot

Hi everyone. I’m running a node has 200+ channels, but it loses connection with other nodes every other day or even twice a day sometimes. It’s like 30 channels go offline first, then it keep losing channels gradually towards 170 offline over time. Interestingly, around 30 channels (I’m not sure whether those are random) keep online status even after 5-6 hours.

Restarting from umbrel dashboard or sudo reboot helps back to online, but channels start going offline again in a day or so. Tried reflashing with different SD cards but it didn’t work.

  • HW: Raspi 4 8gb, 1G SSD
  • Internet: Ethernet. 1G fiber.

Things might be related.

  • changed raspi case. But vcgencmd get_throttled returns 0x0, also this issue exist with prior one so probably this is not a reason.
  • channel.db size: 6.8GB - it used to be 20GB before compacting and closing & reopening some nodes have large footprint, but still huge.

What debug log says
I’m not familiar this type of stuff at all, but below three parts seems unhealthy…especially tor.
This was generated when I noticed majority of channels had gone offline.
Full debug file is here.

Bitcoin Core logs
-----------------

Attaching to bitcoin
bitcoin              | 2022-02-26T14:05:12Z Socks5() connect to 177.22.126.217:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:07:00Z Socks5() connect to 74.73.32.84:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:08:39Z Socks5() connect to 220.132.135.54:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:09:52Z Socks5() connect to 2001:470:28:446:216:3eff:fe85:b0fc:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:10:55Z UpdateTip: new best=000000000000000000060b049896f05ed5d4f4610926bc51c448115282b30ba6 height=725014 version=0x3fffe004 log2_work=93.370450 tx=713528519 date='2022-02-26T14:10:51Z' progress=1.000000 cache=96.8MiB(719419txo)
bitcoin              | 2022-02-26T14:13:15Z Socks5() connect to 107.23.31.200:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:14:38Z Socks5() connect to 157.90.21.130:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:16:53Z Socks5() connect to 1.159.3.83:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:19:47Z Socks5() connect to 2a01:c22:cc6e:de00:49c8:9138:b886:ee20:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:20:08Z Socks5() connect to 189.249.51.187:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:21:09Z Socks5() connect to 122.116.90.171:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:21:30Z Socks5() connect to 70.161.168.227:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:22:12Z Socks5() connect to 2800:bf0:149:107d:f8df:8d7d:801b:e25e:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:23:50Z Socks5() connect to 88.88.87.132:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:24:19Z Socks5() connect to 185.157.161.44:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:24:40Z Socks5() connect to 2a0b:f4c1:2::240:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:25:59Z Socks5() connect to 80.216.111.138:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T14:26:41Z Socks5() connect to 138.121.61.33:8333 failed: InterruptibleRecv() timeout or other failure

LND logs
--------

Attaching to lnd
lnd                  |  ExtraOpaqueData: (lnwire.ExtraOpaqueData) {
lnd                  |  }
lnd                  | })
lnd                  | )@1
lnd                  | 2022-02-26 14:26:04.616 [INF] DISC: Removing GossipSyncer for peer=03e8b9a977fa3ae7acce74c25986c7240a921222e349729737df832a1b5ceb49df
lnd                  | 2022-02-26 14:26:04.616 [INF] HSWC: Removing channel link with ChannelID(5316e6b6a80a377b8dd274e7fde7741b958f209cf8a2e8cd7e696703d533d61e)
lnd                  | 2022-02-26 14:26:04.616 [INF] HSWC: ChannelLink(1ed633d50367697ecde8a2f89c208f951b74e7fde774d28d7b370aa8b6e61653:0): stopping
lnd                  | 2022-02-26 14:26:04.617 [INF] HSWC: ChannelLink(1ed633d50367697ecde8a2f89c208f951b74e7fde774d28d7b370aa8b6e61653:0): exited
lnd                  | 2022-02-26 14:26:06.723 [WRN] CRTR: Channel 796758902180937729 has zero cltv delta
lnd                  | 2022-02-26 14:26:06.879 [WRN] CRTR: Channel 604843545532956672 has zero cltv delta
lnd                  | 2022-02-26 14:26:07.345 [WRN] CRTR: Channel 796758902180937729 has zero cltv delta
lnd                  | 2022-02-26 14:26:07.955 [WRN] CRTR: Channel 604843545532956672 has zero cltv delta
lnd                  | 2022-02-26 14:26:09.262 [WRN] CRTR: Channel 796758902180937729 has zero cltv delta
lnd                  | 2022-02-26 14:26:09.593 [WRN] CRTR: Channel 604843545532956672 has zero cltv delta
lnd                  | 2022-02-26 14:26:20.342 [INF] CRTR: Processed channels=0 updates=108 nodes=9 in last 1m0.002267327s
lnd                  | 2022-02-26 14:26:28.945 [INF] PEER: unable to read message from 036da0201ba1f16a2089d1e17a8fe236114adef33bfff59073771723c60b08b4ac@3.139.108.17:9735: read tcp 10.21.21.9:51334->10.21.21.11:9050: read: connection reset by peer
lnd                  | 2022-02-26 14:26:28.956 [INF] PEER: disconnecting 036da0201ba1f16a2089d1e17a8fe236114adef33bfff59073771723c60b08b4ac@3.139.108.17:9735, reason: read handler closed
lnd                  | 2022-02-26 14:26:28.945 [INF] PEER: unable to read message from 020024ccdf545c7e6f240fad85a5d70fccc40cc8e2060996a3596300ca93288889@10.21.21.11:37020: read tcp 10.21.21.9:9735->10.21.21.11:37020: read: connection reset by peer
lnd                  | 2022-02-26 14:26:28.959 [INF] NTFN: Cancelling epoch notification, epoch_id=1374
lnd                  | 2022-02-26 14:26:28.960 [INF] PEER: disconnecting 020024ccdf545c7e6f240fad85a5d70fccc40cc8e2060996a3596300ca93288889@10.21.21.11:37020, reason: read handler closed
lnd                  | 2022-02-26 14:26:28.960 [INF] NTFN: Cancelling epoch notification, epoch_id=797
lnd                  | 2022-02-26 14:26:29.163 [INF] DISC: Removing GossipSyncer for peer=036da0201ba1f16a2089d1e17a8fe236114adef33bfff59073771723c60b08b4ac
lnd                  | 2022-02-26 14:26:29.163 [INF] DISC: Removing GossipSyncer for peer=020024ccdf545c7e6f240fad85a5d70fccc40cc8e2060996a3596300ca93288889
lnd                  | 2022-02-26 14:26:29.163 [INF] HSWC: Removing channel link with ChannelID(2446062ee316204a0e21c9b7dab44542fcc074bc35ca31978ffce9417d69abc1)
lnd                  | 2022-02-26 14:26:29.164 [INF] HSWC: ChannelLink(c1ab697d41e9fc8f9731ca35bc74c0fc4245b4dab7c9210e4a2016e32e064624:0): stopping
lnd                  | 2022-02-26 14:26:29.164 [INF] HSWC: ChannelLink(c1ab697d41e9fc8f9731ca35bc74c0fc4245b4dab7c9210e4a2016e32e064624:0): exited
lnd                  | 2022-02-26 14:26:29.164 [INF] HSWC: Removing channel link with ChannelID(972f4ddb02615a32de75bbee49f65eb778462176aeefb0292ba1d1ac1090b81a)
lnd                  | 2022-02-26 14:26:29.166 [INF] HSWC: ChannelLink(1bb89010acd1a12b29b0efae76214678b75ef649eebb75de325a6102db4d2f97:1): stopping
lnd                  | 2022-02-26 14:26:29.166 [INF] HSWC: ChannelLink(1bb89010acd1a12b29b0efae76214678b75ef649eebb75de325a6102db4d2f97:1): exited
lnd                  | 2022-02-26 14:26:37.985 [INF] NANN: Announcing channel(ff56d71d179abdfed9e5c5cfd92b1ddca1f43394b792939b0a41633af1f697c6:1) disabled [detected]

Tor logs
--------

Attaching to tor, umbrel_tor_server_1
tor                  | Feb 26 14:26:56.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up.
tor                  | Feb 26 14:26:57.000 [warn] Rejecting SOCKS request for anonymous connection to private address [scrubbed]. [51 similar message(s) suppressed in last 360 seconds]
tor                  | Feb 26 14:26:57.000 [notice] Tried for 127 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)
tor                  | Feb 26 14:26:57.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor                  | Feb 26 14:26:57.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)
tor                  | Feb 26 14:26:57.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor                  | Feb 26 14:26:57.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:1234. Giving up. (waiting for circuit)
tor                  | Feb 26 14:26:57.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)
tor                  | Feb 26 14:26:57.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor                  | Feb 26 14:26:57.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor_server_1         | Feb 26 10:29:33.000 [notice] Heartbeat: Tor's uptime is 18:00 hours, with 71 circuits open. I've sent 46.89 MB and received 22.36 MB. I've received 0 connections on IPv4 and 0 on IPv6. I've made 20 connections with IPv4 and 0 with IPv6.
tor_server_1         | Feb 26 10:29:33.000 [notice] While bootstrapping, fetched this many bytes: 651549 (consensus network-status fetch); 14113 (authority cert fetch); 5409444 (microdescriptor fetch)
tor_server_1         | Feb 26 10:29:33.000 [notice] While not bootstrapping, fetched this many bytes: 524106 (consensus network-status fetch); 324581 (microdescriptor fetch)
tor_server_1         | Feb 26 10:29:34.000 [notice] Heartbeat: Tor's uptime is 18:00 hours, with 71 circuits open. I've sent 63.14 MB and received 25.22 MB. I've received 0 connections on IPv4 and 0 on IPv6. I've made 193 connections with IPv4 and 0 with IPv6.
tor_server_1         | Feb 26 10:29:34.000 [notice] While bootstrapping, fetched this many bytes: 651535 (consensus network-status fetch); 14099 (authority cert fetch); 5409444 (microdescriptor fetch)
tor_server_1         | Feb 26 10:29:34.000 [notice] While not bootstrapping, fetched this many bytes: 495338 (consensus network-status fetch); 320051 (microdescriptor fetch)
tor_server_1         | Feb 26 11:20:01.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 371 buildtimes.
tor_server_1         | Feb 26 13:56:15.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 479 buildtimes.

Also, debug file an hour after rebooting is here. At this moment, almost all channels have backed to online. But looks like those three are still in trouble even after rebooting…

Bitcoin Core logs
-----------------

Attaching to bitcoin
bitcoin              | 2022-02-26T15:31:15Z Socks5() connect to 180.125.254.72:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:31:35Z Socks5() connect to 2.94.94.170:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:32:18Z Socks5() connect to 80.109.75.19:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:32:39Z Socks5() connect to 2406:da18:9f1:f301:fdf2:9501:d4dd:956e:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:33:00Z Socks5() connect to 82.135.83.60:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:33:20Z Socks5() connect to 88.155.17.100:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:33:28Z UpdateTip: new best=000000000000000000032bac05cac441d570e2660e6dba425a1c0bef52d2ba72 height=725022 version=0x2000e000 log2_work=93.370558 tx=713543663 date='2022-02-26T15:33:20Z' progress=1.000000 cache=8.9MiB(64477txo)
bitcoin              | 2022-02-26T15:34:38Z Socks5() connect to 109.252.163.76:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:34:56Z UpdateTip: new best=000000000000000000080adef3e527745faa9819e0906ca2c1874cc70d759292 height=725023 version=0x2d39a000 log2_work=93.370571 tx=713543809 date='2022-02-26T15:33:46Z' progress=1.000000 cache=9.0MiB(64814txo)
bitcoin              | 2022-02-26T15:34:58Z Socks5() connect to 89.36.78.139:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:35:19Z Socks5() connect to 95.165.155.163:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:36:59Z Socks5() connect to 35.181.12.230:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:37:20Z Socks5() connect to 94.26.190.7:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:37:40Z Socks5() connect to 156.146.51.70:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:38:01Z Socks5() connect to 206.196.145.143:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:38:22Z Socks5() connect to 157.230.89.219:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:38:43Z Socks5() connect to 2603:9000:9102:a671:695a:4ea5:aadf:86fe:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:38:50Z UpdateTip: new best=00000000000000000006f40243b53181039147bcda9c420a04917546985ad590 height=725024 version=0x2fffe000 log2_work=93.370585 tx=713544427 date='2022-02-26T15:37:03Z' progress=1.000000 cache=9.2MiB(66897txo)
bitcoin              | 2022-02-26T15:39:24Z Socks5() connect to 185.202.220.25:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:39:45Z Socks5() connect to 84.155.43.231:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:40:05Z Socks5() connect to 73.126.137.218:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:42:03Z Socks5() connect to 2804:7f1:e783:8102:811b:1c0a:a16c:b61d:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:42:44Z Socks5() connect to 86.106.90.103:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:43:05Z Socks5() connect to 2406:da1a:5b2:ea00:30bb:d819:3fa0:929:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:43:25Z Socks5() connect to 221.219.98.169:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:43:46Z Socks5() connect to 79.132.229.208:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T15:43:54Z UpdateTip: new best=00000000000000000003f66998794c2802a66ae9ccef27af68587dc77a1ce496 height=725025 version=0x37ffe000 log2_work=93.370598 tx=713545658 date='2022-02-26T15:43:30Z' progress=1.000000 cache=9.8MiB(71140txo)

LND logs
--------

Attaching to lnd
lnd                  | 2022-02-26 15:46:30.780 [INF] DISC: Broadcasting 201 new announcements in 17 sub batches
lnd                  | 2022-02-26 15:46:33.543 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:35.943 [INF] CRTR: Processed channels=1 updates=130 nodes=17 in last 1m0.005935972s
lnd                  | 2022-02-26 15:46:36.090 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.107 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.163 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.179 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.180 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.199 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.226 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.257 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.515 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.518 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.530 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.541 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.551 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.622 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.634 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.983 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:36.984 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:37.096 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:37.127 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:37.128 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:37.164 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:41.698 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:41.854 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:41.888 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:41.922 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:42.648 [ERR] RPCS: [/lnrpc.Lightning/GetNodeInfo]: rpc error: code = NotFound desc = unable to find node
lnd                  | 2022-02-26 15:46:43.978 [INF] NANN: Announcing channel(235f4241ad13e7ad86075b90fea4f1fa6da3c1559d75ef968d827f4271a3e7b9:2) disabled [detected]

Tor logs
--------

Attaching to umbrel_tor_server_1, tor
tor_server_1         | Feb 26 14:35:21.000 [notice] Guard MorbidHenry ($83C81FF619BE11C46F0E98B27B437154779ED6CC) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 102/151. Use counts are 60/60. 102 circuits completed, 0 were unusable, 0 collapsed, and 2 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 14:35:21.000 [warn] Guard MorbidHenry ($83C81FF619BE11C46F0E98B27B437154779ED6CC) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 103/207. Use counts are 60/60. 103 circuits completed, 0 were unusable, 0 collapsed, and 2 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 14:35:21.000 [warn] Guard t4cc0reTor1 ($5C8B811887778DCF705F3D39F19E40A21889451F) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 102/205. Use counts are 60/60. 102 circuits completed, 0 were unusable, 0 collapsed, and 2 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 14:35:22.000 [warn] Guard t4cc0reTor1 ($5C8B811887778DCF705F3D39F19E40A21889451F) is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 99/331. Use counts are 60/60. 99 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 14:35:23.000 [warn] Guard MorbidHenry ($83C81FF619BE11C46F0E98B27B437154779ED6CC) is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 98/327. Use counts are 60/60. 98 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 14:35:23.000 [warn] Guard who4USicebeer20 ($DC8493CDEB4FC52A7AAA8B6D6D58FAF461D3819D) is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 102/341. Use counts are 63/63. 102 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 14:35:24.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 104 buildtimes.
tor_server_1         | Feb 26 14:35:28.000 [notice] Guard MDMLU1 ($B4BAEE803B6EB75750D6584A24FB37BE53F4E75D) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 103/151. Use counts are 60/60. 103 circuits completed, 0 were unusable, 0 collapsed, and 2 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 14:35:28.000 [warn] Guard MDMLU1 ($B4BAEE803B6EB75750D6584A24FB37BE53F4E75D) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 103/207. Use counts are 60/60. 103 circuits completed, 0 were unusable, 0 collapsed, and 2 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 14:35:30.000 [warn] Guard MDMLU1 ($B4BAEE803B6EB75750D6584A24FB37BE53F4E75D) is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 99/331. Use counts are 60/60. 99 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor                  | Feb 26 15:46:45.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up.
tor                  | Feb 26 15:46:45.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)
tor                  | Feb 26 15:46:45.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)
tor                  | Feb 26 15:46:45.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)
tor                  | Feb 26 15:46:45.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)
tor                  | Feb 26 15:46:45.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)
tor                  | Feb 26 15:46:45.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)
tor                  | Feb 26 15:46:49.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor                  | Feb 26 15:46:49.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor                  | Feb 26 15:46:49.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for rendezvous desc)

Could anyone help me to figure out what’s wrong? Thanks in advance.

Try a hybrid mode using this guide

Watch for few days and note which peers are still offline. Try adding manually the peer, if is not responding, close the channel, that node is dead.

Thanks for the comment.
Will try the hybrid soon, but do you come up with any potential reasons of this?

I’ve already identified 8 dead peers, but other 190+ are obviously alive. When I tried to add a offline peer (my node is offline, actually) manually, the error Dial proxy failed: socks connect tcp 10.21.21.11:9050->XXX.XXX.XXX.XX:9735: unknown error general socks server failure occurs.

So how do you want to add peers if your node is offline?

I mean, my node is online and has some online peers, but mine probably looks offline from those “offline” peers during the issue is happening because of tor or something. I just meant to say the issue seems to be on my side except for the 8 dead peers.

Simple check if your node is “alive” online and ready to respond:
Go to https://ping.eu/port-chk/
Click on the IP on top (that is your IP from your router)
Check if it responds on port 9735. If is says open, your node is alive and ready to receive inquiries for LN.

Thanks for the instruction. The port was closed somehow, so opened to my node then rebooted now.
But the debug log keeps saying the same thing…

Bitcoin Core logs
-----------------

Attaching to bitcoin
bitcoin              | 2022-02-26T20:37:09Z New outbound peer connected: version: 70016, blocks=725047, peer=16 (outbound-full-relay)
bitcoin              | 2022-02-26T20:37:13Z New outbound peer connected: version: 70016, blocks=725047, peer=17 (block-relay-only)
bitcoin              | 2022-02-26T20:37:16Z New outbound peer connected: version: 70016, blocks=725047, peer=18 (block-relay-only)
bitcoin              | 2022-02-26T20:37:52Z UpdateTip: new best=0000000000000000000076476ffc3a097d2139a6274a26a98c08e9f7c884af0b height=725048 version=0x2000e000 log2_work=93.370910 tx=713598455 date='2022-02-26T20:37:33Z' progress=1.000000 cache=5.7MiB(42633txo)
bitcoin              | 2022-02-26T20:37:52Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
bitcoin              | 2022-02-26T20:38:26Z Socks5() connect to 18.162.133.34:8333 failed: general failure
bitcoin              | 2022-02-26T20:38:31Z Socks5() connect to 18.162.133.34:8333 failed: general failure
bitcoin              | 2022-02-26T20:43:40Z UpdateTip: new best=0000000000000000000610a44205b14c3b6239de23036057ce5b80bb16ab0ada height=725049 version=0x20400000 log2_work=93.370923 tx=713600546 date='2022-02-26T20:42:46Z' progress=1.000000 cache=6.6MiB(49326txo)
bitcoin              | 2022-02-26T20:45:30Z Socks5() connect to 54.37.194.43:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:47:00Z Socks5() connect to 216.86.93.72:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:47:22Z Socks5() connect to 188.72.203.144:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:48:29Z Socks5() connect to 64.33.171.130:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:49:03Z UpdateTip: new best=0000000000000000000222ea98a144e7dab0da6148236e6e4e19609611aa7b7b height=725050 version=0x27ffe000 log2_work=93.370937 tx=713602351 date='2022-02-26T20:48:31Z' progress=1.000000 cache=7.3MiB(55155txo)
bitcoin              | 2022-02-26T20:49:04Z Socks5() connect to 154.6.24.94:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:50:15Z Socks5() connect to 194.14.246.8:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:53:02Z UpdateTip: new best=00000000000000000009460df78eb438d4f2dcc435fc52415cd9b4067822122b height=725051 version=0x20600004 log2_work=93.370950 tx=713603985 date='2022-02-26T20:52:31Z' progress=1.000000 cache=7.7MiB(58849txo)
bitcoin              | 2022-02-26T20:55:53Z Socks5() connect to 178.112.81.138:8333 failed: general failure
bitcoin              | 2022-02-26T21:13:34Z UpdateTip: new best=000000000000000000099ac1af00057472da35b55aaaf3ca33ec3f948436c5a2 height=725052 version=0x26586004 log2_work=93.370964 tx=713606337 date='2022-02-26T21:13:23Z' progress=1.000000 cache=10.4MiB(76161txo)
bitcoin              | 2022-02-26T21:14:18Z UpdateTip: new best=00000000000000000000d20339779bee14ad1e05d6de5d4814ef1098ca7995f2 height=725053 version=0x2000e000 log2_work=93.370977 tx=713607662 date='2022-02-26T21:13:53Z' progress=1.000000 cache=10.7MiB(79189txo)
bitcoin              | 2022-02-26T21:18:58Z New outbound peer connected: version: 70016, blocks=725053, peer=23 (block-relay-only)
bitcoin              | 2022-02-26T21:21:57Z UpdateTip: new best=0000000000000000000945398a20809660a815ebf77e8f0f91a3abb919390f3b height=725054 version=0x27ffe000 log2_work=93.370991 tx=713609384 date='2022-02-26T21:21:37Z' progress=1.000000 cache=11.5MiB(85351txo)
bitcoin              | 2022-02-26T21:23:06Z UpdateTip: new best=00000000000000000004862060f857a003c0ebd27bb8ef273f5bb7e0650eaac1 height=725055 version=0x20a00004 log2_work=93.371004 tx=713609692 date='2022-02-26T21:21:56Z' progress=1.000000 cache=11.6MiB(86267txo)
bitcoin              | 2022-02-26T21:25:38Z Socks5() connect to 2600:1700:1851:6fe0::2b:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T21:25:58Z Socks5() connect to 173.216.31.172:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T21:26:19Z Socks5() connect to 188.255.85.37:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T21:26:43Z Socks5() connect to 31.48.89.237:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T21:32:56Z Socks5() connect to 87.122.9.150:8333 failed: general failure

LND logs
--------

Attaching to lnd
lnd                  |  ChainHash: (chainhash.Hash) (len=32 cap=32) 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f,
lnd                  |  ShortChannelID: (lnwire.ShortChannelID) 633958:1809:0,
lnd                  |  Timestamp: (uint32) 1645878489,
lnd                  |  MessageFlags: (lnwire.ChanUpdateMsgFlags) 00000001,
lnd                  |  ChannelFlags: (lnwire.ChanUpdateChanFlags) 00000000,
lnd                  |  TimeLockDelta: (uint16) 40,
lnd                  |  HtlcMinimumMsat: (lnwire.MilliSatoshi) 1000 mSAT,
lnd                  |  BaseFee: (uint32) 1000,
lnd                  |  FeeRate: (uint32) 100,
lnd                  |  HtlcMaximumMsat: (lnwire.MilliSatoshi) 9900000000 mSAT,
lnd                  |  ExtraOpaqueData: (lnwire.ExtraOpaqueData) {
lnd                  |  }
lnd                  | })
lnd                  | )@6
lnd                  | 2022-02-26 21:33:57.472 [INF] SRVR: Established connection to: 033878501f9a4ce97dba9a6bba4e540eca46cb129a322eb98ea1749ed18ab67735@86.127.240.127:9735
lnd                  | 2022-02-26 21:33:57.472 [INF] SRVR: Finalizing connection to 033878501f9a4ce97dba9a6bba4e540eca46cb129a322eb98ea1749ed18ab67735@86.127.240.127:9735, inbound=false
lnd                  | 2022-02-26 21:33:57.511 [INF] PEER: NodeKey(02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844) loading ChannelPoint(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0)
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: Removing channel link with ChannelID(dfff8400e980972521ca342dbb03c25e090c6b9897fee8eb40900411d458adb8)
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: ChannelLink(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0): starting
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: Trimming open circuits for chan_id=722806:2422:0, start_htlc_id=1429
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: Adding live link chan_id=dfff8400e980972521ca342dbb03c25e090c6b9897fee8eb40900411d458adb8, short_chan_id=722806:2422:0
lnd                  | 2022-02-26 21:33:57.512 [INF] NTFN: New block epoch subscription
lnd                  | 2022-02-26 21:33:57.512 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0)
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: ChannelLink(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0): HTLC manager started, bandwidth=4276891969 mSAT
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: ChannelLink(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0): attempting to re-synchronize
lnd                  | 2022-02-26 21:33:57.514 [INF] PEER: Negotiated chan series queries with 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844
lnd                  | 2022-02-26 21:33:57.514 [INF] DISC: Creating new GossipSyncer for peer=02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844
lnd                  | 2022-02-26 21:33:57.520 [INF] HSWC: ChannelLink(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0): received re-establishment message from remote side
lnd                  | 2022-02-26 21:33:57.780 [WRN] CRTR: Channel 714084423817756672 has zero cltv delta
lnd                  | 2022-02-26 21:33:58.238 [INF] PEER: disconnecting 033878501f9a4ce97dba9a6bba4e540eca46cb129a322eb98ea1749ed18ab67735@86.127.240.127:9735, reason: unable to start peer: unable to read init msg: EOF

Tor logs
--------

Attaching to umbrel_tor_server_1, tor
tor                  | Feb 26 21:33:53.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:33:53.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:33:53.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:33:55.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:33:56.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:34:02.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor_server_1         | Feb 26 20:35:01.000 [notice] Guard Piratenpartei10 ($166850D169CC7956E77525A1A9228BC4563CFC8B) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 101/151. Use counts are 61/61. 101 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:01.000 [warn] Guard Piratenpartei10 ($166850D169CC7956E77525A1A9228BC4563CFC8B) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 101/203. Use counts are 61/61. 101 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:04.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 102 buildtimes.
tor_server_1         | Feb 26 20:35:09.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 102 buildtimes.
tor_server_1         | Feb 26 20:35:10.000 [notice] Guard gbt2USicebeer06b ($D75510F5C9F356554AA47B3FB2283DA479B47574) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 100/151. Use counts are 60/60. 100 circuits completed, 0 were unusable, 0 collapsed, and 18 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:10.000 [warn] Guard gbt2USicebeer06b ($D75510F5C9F356554AA47B3FB2283DA479B47574) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 100/201. Use counts are 60/60. 100 circuits completed, 0 were unusable, 0 collapsed, and 18 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:11.000 [notice] Guard veha ($7EDDD17E812AD07C3F0C48D5B3999BA6CB55CC2C) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 101/151. Use counts are 59/59. 101 circuits completed, 0 were unusable, 0 collapsed, and 3 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:11.000 [warn] Guard gbt2USicebeer06b ($D75510F5C9F356554AA47B3FB2283DA479B47574) is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 99/331. Use counts are 60/60. 99 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:11.000 [warn] Guard veha ($7EDDD17E812AD07C3F0C48D5B3999BA6CB55CC2C) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 101/203. Use counts are 59/59. 101 circuits completed, 0 were unusable, 0 collapsed, and 3 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:46.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 118 buildtimes.

Can you add a screenshot of htop when node is in disconnected state (before reboot).

I had a memory leak in a process that made my Umbrel/lnd unresponsive.

Currently, 85 channels out of 200 have been disconnected. Debug log says same thing.
I know the number of disconnected channels increase overtime and never come back, so will reboot now…

Here’s screenshot of htop before rebooting.
cpu%

mem%

Lnd looks very bad… 11G virtual ram. CPU load looks bad 4.85.

Do you know the size of your lnd dB? Have you ever compacted it?

Right now, channel.db is 7.6GB. It used to be 20GB+ at maximum mostly due to dozens of channels have large footprint. Closed and reopened top 5 largest footprint channels helped me to reduce 12GB, so I’m going to continue to do the work.
Do you think the size is somewhat related to the disconnect/tor issue?

If majority of your peers are using that damn script charge-lnd, your are kinda fucked.
As I explained here, that script is affecting lots of nodes.

1 Like

What about rebalance-lnd? it conflicts with LNDg if bot are installed in the same node?

I think LNDg is enough and better. but use it punctually, not intensively.

1 Like

It’s been a week since I migrated to more powerful hardware. I see some random disconnections, but the core issue haven’t happened since then.

Well, it starts happening once channel.db is around 14GB. I decided to be a hybrid node with help of this article, and it works pretty well so far.

Read this

1 Like

I think a number of things influence it.

  • channel.db > 10gb is too much to handle for a RPi. The 8gb RAM can do, but hit’s bottlenecks, too, which is effecting connectivity
    You addressed that with hardware improvements and channel-compacting which Darthcoin referenced.
  • Tor starts acting up with connectivity issues when under heavy load, so 180 channels all via Tor for a Pi, also a lot to handle.
    You also addressed this via going Hybrid

So things should overall improve, I’d say.

I had the same problem. I have now uninstalled all the additional apps and only LND and RTL are running. Now everything works again.

2 Likes

Thanks Darth. In my case, just compacting with reboot doesn’t help much to reduce the size of DB. To achieve it, I have to close and re-open some largest footprint channels then reboot. Implementing the process to the most active channel helps me to reduce 1GB+ reduction, but within 1-2 weeks, the size back to the previous level and keep growing…it’s endless process. I can’t wait LND v0.15.