Optimal Hardware Upgrades for Faster Umbrel and LND Performance?

I’m considering a hardware upgrade for my Umbrel setup, as my current hardware feels somewhat slow, especially with processing lightning transactions and starting LND after a reboot (even though I’m using a quad-core Intel Core i7-3720QM, 16GB of DDR3 memory, and a 2TB Samsung SATA SSD). I hope a hardware upgrade could help improve this performance.

However, I’m curious to know which component I should prioritize. Which part is the most critical for a smooth experience with UmbrelOS, and especially with LND? Is it the RAM, the CPU, or the SSD? Which component is worth investing more in?

I’m currently considering a setup with an AMD Ryzen 7 3700X, 32GB of DDR4 memory, and the same 2TB SSD. What are your thoughts on this setup? Or would it be worth going further and investing in DDR5 RAM? I’m also wondering how much an upgrade to an NVMe SSD would help (not sure about the migration process though).

Thanks in advance for any advice and insights.

First off, I’d say your current HW on CPU and RAM is more than sufficient. The highest performing nodes get along fine with an equivalent you have, and hundreds of channels.

What’s the bottleneck for LND and particularly reboot is: channel.db handling. It’s boltd and needs to be

  • loaded into RAM for every reboot
  • can’t handle on-the-fly compacting (like postgres can) so when rebooting and compacting, it takes time

Faster CPU and RAM won’t accelerate this process, which you can observe with htop or system monitoring software when rebooting. What likely is the bottleneck instead: I/O Ops of your SSD

So an uber fast NVME will likely improve since their throughput is better than SSD. And LND has ~ 10-50x more read-processes than write in the usual LND-is-running mode, which should handle the NVME faster, too.

But with boltd, you will always have a longer restart, and need to calculate longer down time for compacting. Mitigation: Restart as rarely as possible. My compaction is ~ once per quarter, and roughly not more often restarting.

For restart acceleration without compacting, try the following in another terminal window when restarting LND:

cat channel.db > /dev/null
cat sphinxreplay.db > /dev/null

([INF] LTND: Database(s) now open (time_to_open=5m3.790720607s))
what cat does is loading the channel.db into RAM, which usually is still a lot faster than NVME.

For migration, that’s something which needs careful planning. But the gist is: Stop everything, disable LND auto restart => clone the SSD => NVME offline, come back online and doublecheck everything is in place before starting LND.

Hope this helps.

Thank you for your answer.

OK, so I am buying a new M.2 NVMe SSD.
I will also try that trickery with loading channel.db and sphinxreplay.db into RAM.

Maybe do you know of any good guide for the migration process to the new SSD? I really don’t want to mess this up. Force-close with penalty is the last thing I’d wish for.

I am also considering running umbrelOS in Proxmox ZFS mirror. Is that a good idea?
(Proxmox running on small SATA SSD + umbrel VM running on 2x2TB NVMe ZFS mirror)

Yeah, understand that this is a goosebump process. I’ve done mine after meticulous preparation, but probably much more change involved, when moving from SSD + small NVMe to SSD + large NVMe & LVMS & Raid.

FWIW my current setup looks like this:

  • NVME and SSD:

    • /boot: 1GB, type “primary,” mount point /boot (to be replicated)
    • /: 100GB, type “primary,” mount point /
    • /home: 120GB, type “primary,” mount point /home
    • /data/lnd: 120GB, type “primary,” mount point /data/lnd (to be replicated)
  • SSD Only:

    • /data/bitcoin: 1.5TB, type “primary,” no mount point (not replicated)
  • SSD Only (for Swap):

    • /swap: 20GB, type “primary,” no mount point (not replicated)

I’ve done all this in no rush because I was able to move my main node to another hardware until everything was set and ready.

If you just want to swap the SSD with the NVMe, I’d probably do the following

  • disable LND systemd => sudo systemctl disable lnd.service
  • shut down
  • keep the SSD in there, add the NVMe
  • boot with a clonezilla USB boot image
  • clone the SSD to the NVMe
  • if the target is larger, either resize while cloning, or afterwards with gparted
  • shutdown after successful completion, get the SSD out, boot
  • check everything starts properly (bitcoind), size is alright (lsblk and df -h) and only then try either manually starting LND or by sudo systemctl restart lnd.service while watching it’s logs in a second terminal

Best I could provide as a “guide”, but as far as I know, plenty of folks have followed the clonezilla approach.

Hope this helps

1 Like

I’m not familiar with proxmox nor ZFS, but know a few running this way. ZFS is not for the faint hearted, prolly only go down this way if you know what you’re doing. At least 2 runners I know toasted their nodes due to troubleshooting mishaps with ZFS.

1 Like

Ouch, this hurts. Well, I think I’ll need to think a bit more about the optimal configuration.

Does this apply only to the ZFS mirror, or to any RAID 1 configuration overall?
(my motherboard allows me to create RAID 1 in BIOS)

Or perhaps create a software Ext4 RAID 1 setup?

I can only speak from my experience: My board allows Raid1 too, but I went for LVM + mdadm software raid, since it’s more flexible to manage. If you care, I can share you my notes setup via DM

1 Like

That would be great, thank you!