Slow, now non accessible node

Until a week or so ago, my node has been running perfectly on my Rasberry Pi 4 since setup about 10 months ago. Then, suddenly it stopped syncing new blocks: it would reach a certain block, then cycle back 4 blocks previous and get stuck endlessly in this loop. Eventually I tried to reset and restore my node. This worked ok, but was very slow – after a couple of days, synchronization was only at 11% and getting slower. (When I had first set up the node, complete syncronization took only about 1 week.) I decided the issue must be an SSD error, so I bought the hardware recommended on the Umbrel site and started a fresh restore yesterday. Again, all seemed to be running ok, but again, slowly. This morning (12 hours after the fresh restore), sync was not yet even at 1%. And now, all of a sudden, I can’t access the node at all (either from laptop or SSH). I don’t believe there cannot be a throttling issue as the last few times I checked the temperature it was around 40ºC. Any suggestions?

Update: after a few failed attempts, I managed to do a fresh restore and can at least now access my node. Uptime is 2 hours now and it’s still stuck on 0% and currently states: “0 of 103,998 blocks”. Not feeling too hopeful but am going to wait it out and see if anything happens…

Hey, sorry to hear about your issues, especially given that things were working fine for awhile. Since it’s at least running, are you able to export the logs so we can take a look? (Settings > System > Troubleshoot > Start)

“Error: Failed to fetch debug data.”

Furthermore, in trying that, I’ve just noticed that some UI elements are no longer visible: my username, logout button, TOR onion address…

Hm, are you able to SSH into the machine?

Also, when you did your fresh restore, was it a straight reimage of your SD card?

Yes, I am able to SSH in.

For the restore, I basically started from scratch – reflashed the SD and with a clean SSD.

Can you try running ~/umbrel/scripts/debug and pasting the output into pastebin?

Should help us parse out what’s happening here.

Thanks for helping. Here’s the logs…

It sort of feels like your Pi might not be connecting to the outside world given all the connection errors I’m seeing…can you just try a “ping google.com” in the terminal and show me what you see?

PING google.com (142.250.200.14) 56(84) bytes of data.

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=5 ttl=114 time=20.9 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=9 ttl=114 time=21.8 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=26 ttl=114 time=20.4 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=30 ttl=114 time=19.3 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=32 ttl=114 time=19.2 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=42 ttl=114 time=18.8 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=47 ttl=114 time=20.9 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=48 ttl=114 time=18.5 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=53 ttl=114 time=18.7 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=72 ttl=114 time=19.4 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=74 ttl=114 time=22.5 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=79 ttl=114 time=20.6 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=89 ttl=114 time=19.0 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=95 ttl=114 time=19.2 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=104 ttl=114 time=19.0 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=106 ttl=114 time=20.3 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=116 ttl=114 time=19.1 ms

64 bytes from lhr48s29-in-f14.1e100.net (142.250.200.14): icmp_seq=124 ttl=114 time=19.9 ms

^C

--- google.com ping statistics ---

126 packets transmitted, 18 received, 85.7143% packet loss, time 573ms

rtt min/avg/max/mdev = 18.521/19.853/22.509/1.090 ms

Another random question - but are you using a fan on your pi? (surprisingly it can cause issues)

Interesting. I was using a heat sink, which I have just removed and now blocks are syncing at a pretty good speed. I wasn’t using the heat sink originally but it didn’t occur to me that this might be the issue. Why would that be? And are there any recommendations for reducing high temperatures?

Hm, that’s actually not what I expected to happen - basically, the issue I was thinking of with the fan is that raspberry pis can be VERY temperamental when it comes to drawing power, so any additional fans or even the wrong kind of SSD enclosure can create issues. Heat sink though is new to me (and I wouldn’t suggest leaving it off), lemme ping @lukechilds and see if he has any thoughts.

1 Like

I just went through a months-long RPi4 node troubleshoot and reboot (after running Umbrel and Lightning successfully for 6 months). For me it was sudden, random hardware issues. @jonsyu is right: the Pis are VERY temperamental. At first I tried to be a good scientist and isolate variables, but eventually what worked was some combination of changing the ethernet cable, using a different USB port for the external drive, and changing out the external drive itself, then reflashing, redownloading, etc.

My life lesson: Pis are great little machines for learning, but unless you 99% know what you’re doing don’t rely on it especially with a ₿ stash that is meaningful to you. It can lead to some uncomfortable moments. For a “reliable node”, look into more powerful options: cluster SBCs, dedicated laptop, Mac mini, etc.

Godspeed.

1 Like

pinging @louneskmt as well

1 Like

Could you send a link to your heatsink?

This is the heat sink I was using (albeit without the fans): https://thepihut.com/products/dual-fan-heatsink-case-for-raspberry-pi-4

Good news – my node has 100% synchronized, wallet balance has been restored and everything now appears to be operating normally. For the time being, I’m running the Pi 4 without a heat sink and temperature is averaging in the high 60ºCs. Will await to see alternative recommended solutions for reducing temperature. Thanks @louneskmt for looking into this and to @jonsyu for the support.