I’ve been running my node since April with no issue. Hardware is:
- Rasp Pi 4 4 Gb Ram
- FLIRC case → added recently
- 1 TB SSD Verbatim Vi550 S3
- Official power supply
But it’s down for some weeks now. The web interface stays stucks on the “starting umbrel” page. Sometimes it displays the Red Umbrella of Death with “system service failed” message. The LED on the SSD keeps blinking without interruption. I’m not totally sure, but it seems to have appeared after repeated power outages in December due to snowfalls.
When SSH’ing into the Pi, htop shows a load average over 20%
I tried running the debug command to get the logs but nothing happens, the logs never show up. I also tried following the steps listed here: Red Umbrella of Death after Power Outage
and commands to stop umbrel or docker seem not to respond as well.
Command to update the os version does not work as well. Even ls command ran in the /umbrel folder hangs.
I plugged the SSD into a Ubuntu machine an ran e2fsck, it told me it was clean.
I reflashed the SD with 0.4.8 and 0.4.9 outcome is the same.
Any idea of what I could do?
Could you run
docker ps and paste the dump here. Are there docker container services running at all?
And just to verify,
~/umbrel/scripts/debug doesn’t do anything?
dmesg to check whether your SSD has a mounting issue - or
df -hal | grep G | head -10 to check whether the SSD is full
Thanks @Hakuna for your help.
df -hal | grep G | head -10 gives the output below
/dev/root 59G 3.1G 53G 6% /
devtmpfs 1.7G 0 1.7G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 18M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/root 59G 3.1G 53G 6% /status-server
/dev/sda1 938G 488G 403G 55% /mnt/data
/dev/sda1 938G 488G 403G 55% /home/umbrel/umbrel
/dev/sda1 938G 488G 403G 55% /var/lib/docker
/dev/sda1 938G 488G 403G 55% /swap
dmesg gives several warnings/errors related to SSD device and timeouts for processes
htop shows docker related processes:
docker ps and waiting 10 mins or so
Really tricky, maybe others here can jump in.
Assuming it’s either
- something with the SSD. Can you try connecting it to your laptop / desktop and check whether it’s working fine? Under linux, you can run fsck (read mode only) or efsck, checking for bad sectors or file system
- something with the heat? Is the node particulary hot, since you use a new casing, maybe that’s causing your node to malfunction. You could check temperature and whether the node is throttled with
vcgencmd measure_temp && vcgencmd get_throttled
- lastly, perhaps something with the USB connection. Do you have a chance to change cables, the USB connector?
sorry bit poking into the dark here too
Yes, I’m also poking into the dark
- I did try to connect the SSD to a laptop. No problem, encountered. Ran
fsck both told me it is clean.
Okay, I think we can somewhat assume the hardware is fine. Let’s bring out the big guns and point at the software (Umbrel OS).
Could you first ensure that
- LND isn’t running (I’d assume so, since docker ps isn’t running, and I can’t see any docker services on htop with the user umbrel)
- backup your channel.backup under
~/umbrel/backup and all your files / your lnd channel states under
- backup your lnd.conf under
- have your seed words
once this is all done, let’s try to kill and reinstate the docker system. I think this is where it’s hanging up
Source ==> Umbrel Troubleshooting Guide
Some docker component fail to start
I can’t access umbrel.local on browser or ip address. Did ssh and ran debug script. First suspect line is:
stat /var/lib/docker/overlay2/....... no such file or directory
How to fix this issue:
- just in case, re-flash the mSD card with the latest version of UmbrelOS (exactly the steps you did first time installing your node using the instructions from getumbrel.com
- If still don’t do nothing, use this command (enter using SSH into your node):
sudo systemctl stop umbrel-startup.service && docker system prune --force --all && sudo systemctl start umbrel-startup.service
Restart your node
Optional another command to clear the docker containers is:
sudo docker kill $(sudo docker ps -aq) && sudo docker rm $(sudo docker ps -aq)
then restart your node
Let’s see how that works
Hello @Hakuna and thank you again for your much appreciated help.
~/umbrel/lnd/data/graph/mainnet/ are empty, maybe because I already reflashed the os a few times before? as for
~/umbrel/lnd I don’t remember, but I shall confirm that.
~/umbrel/backup doesn’t exist.
~/umbrel/lnd/data/graph/mainnet/ does have some files
lnd.conf exists in
sudo systemctl stop umbrel-startup.service && docker system prune --force --all && sudo systemctl start umbrel-startup.service and related commands involving
docker don’t work, just as
I tried running those command while having the SSD unplugged, it works, and some data around 1Gb has been cleared. However, after shutting down the node, pluging back the SDD in and restarting the node the result is the same.
Any idea? Is there a way to wipe the SSD without losing the sync and having to restore with the seed?
Hi, thanks for your help @Hakuna.
I finally had to try to format the SSD, but I couldn’t manage to complete it, it seems that it was dead for good. Had to start over with a brand new ssd
If there was a Kubernetes version of Umbrel, it would probably be more resiliant. (and self-healing) (similar to how https://aerokube.com/moon/latest/#install-kubernetes does it.)