Umbrel Home Bitcoin Node stuck syncing, now stuck on starting

After I failed to migrate (How to migrate from a Raspberry Pi Umbrel to Umbrel Home - #38 by jgmontoya) I decided to just set up my home as a brand new node. During IBD I noticed it would start syncing normally, then stop and start dropping peers. It would then stay stuck at whatever percent it managed to sync (low single-digit %).

I followed this: Factory Reset (Umbrel Home) - #2 by denny to reset my umbrel home. This took me back to version 0.5.x, I updated to 1.2.2 and then tried to install the bitcoin core app again.

Now the app is stuck on starting:

I’ve tried restarting and I’ve also tried removing the app and installing it again.

Bitcoin app logs:
umbrel_bitcoin_2024-08-04_20-50.log (100.7 KB)

Can you post Logs, check to see if it needs reindexing.

Logs for the bitcoin app are already on the original post.

I tried uploading the umbrel logs but had some issues because of the size so here it is (had to add the .txt extension so it would let me upload, but otherwise its directly what I download compressed from my umbrel, just remove the .txt and you’ll be able to uncompress):
umbrel-1722901380592.log.gz.txt (2.4 MB)

I didn’t see what I found in mind. Maybe soeone else can share some insight. Sorry.

1 Like

“ERROR: ReadBlockFromDisk: Deserialize or I/O error - ReadCompactSize(): size too large: iostream error at FlatFilePos”

Corrupted block data. You should delete it and start IBD from scratch. Although if you already have synced blockchain data on another (umbrel) device, why not copy it from there? And while at it, you can also copy lightning / lnd dir (or perhaps channel.db alone, which might be more foolproof way, but with few extra steps). Without very basic linux skill you could struggle tho.

Yes I’ve tried doing it from scratch several times. For some reason it keeps getting corrupted after some time. I don’t really care about copying the LN data over, I’m fine starting from scratch in that regard (and I’ve already closed all of my old channels so it doesn’t matter anyway).

I hope this is not some hardware issue, my umbrel home is basically new :grimacing:

Hmm, can’t think of other explanation then failing disk then… If u r statistically unlucky enough, I guess it can happen with new disks too…

Btw iiuc umbrel home switched from time-tested 2 disks (OS + data) arrangement to 1disk (with extra partitions)… Issues like yours are imo examples of why it was not good design decision. It makes things more complicated, adding unnecessary risks… Umbrel things… :roll_eyes: You can try to run fsck, but not sure if you can even do that properly on the main disk without having the linux booted from different disk (ie usb stick)… Maybe you can, but I’m not planning to buy umbrel home just to try…

1 Like

Hey @jgmontoya

Just had a look at your logs, and there are tell-tale signs, as @babik suggested, that something might be up with the filesystem or disk that’s causing the Bitcoin Node to not work properly.

With that in mind, could you try the following steps and let me know what the results are?

  1. Let’s try and rule out a hardware issue by checking the SMART status of the drive. In your umbrelOS terminal can first install the utilities by running:

    sudo apt install smartmontools
    

    Then to perform a quick SMART test you can run:

    sudo smartctl -t short /dev/<disk>
    

    Where <disk> is the main drive on your Umbrel Home. I’m not too sure what it’s referred to as but could be any of sda, nvme0n1, or mmcblk0 — or even something else entirely (check the output of sudo lsblk in that case).

    Then you can review that with the following command:

    sudo smartctl -a /dev/<disk>
    

    Optionally, you may also want to run a longer test with sudo smartctl -t long /dev/<disk> and then check again as it may find a problem that a quick test might not.

  2. Following that, we can check for any individual sectors / cells on the drive that may have gone “bad” and don’t work anymore. (This can happen, even on new drives and usually its a very small amount, but if the system doesn’t detect and isolate them, it can cause I/O errors to manifest).

    Run the following to check for that:

    sudo badblocks -b 4096 -c 1024 -s /dev/<disk>
    
  3. If the above steps didn’t solve the issue and SMART is saying everything is OK, then it could also just be a corrupted filesystem, you can check it with fsck in read-only mode by running:

    sudo fsck -f -n /dev/<target>
    

    Where <target> is sda1, nvme0n1p1, mmcblk0p1 or the equivalent main system partition on your Umbrel Home.
    If fsck detects a problem, you may need to make a live Linux USB and boot into it in order to clean up the filesystem as you can’t do it while umbrelOS is running. You can use the https://www.system-rescue.org/ utility to do that by making a USB with Rufus or Balena Etcher and booting it on your Umbrel Home, then you can run fsck in that without the -n switch like so:

    sudo fsck -f /dev/<target>
    

    This should repair any filesystem errors that might’ve arisen from power losses / write errors.

If after all these steps, you’re still experiencing these issues, update us here and let myself or @lukechilds know and we’ll get it sorted for you asap. :people_hugging:

2 Likes

Hey! First of all thanks a lot for your response and the detailed instructions, and I’m sorry for the delay in responding, I was unable to do it sooner.

  1. From sudo lsblk I can see that the disk name is nvme0n1.

sudo smartctl -t short /dev/nvme0n1 returned:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-23-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

NVMe device successfully opened

Use 'smartctl -a' (or '-x') to print SMART (and more) information

Then sudo smartctl -a /dev/nvme0n1:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-23-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       PCIe SSD
Serial Number:                      DE510741042500003431
Firmware Version:                   EHFM70.2
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            6479a7 892a2037fc
Local Time is:                      Wed Aug 28 22:46:55 2024 UTC
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     83 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x08):        No_ID_Reuse

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.00W       -        -    0  0  0  0        0       0
 1 +     3.00W       -        -    1  1  1  1        0       0
 2 +     1.90W       -        -    2  2  2  2        0       0
 3 -   0.0500W       -        -    3  3  3  3     5000    2500
 4 -   0.0050W       -        -    4  4  4  4     8000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         1
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        49 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    0%
Data Units Read:                    741,135 [379 GB]
Data Units Written:                 7,555,896 [3.86 TB]
Host Read Commands:                 33,528,304
Host Write Commands:                53,151,464
Controller Busy Time:               184
Power Cycles:                       18
Power On Hours:                     469
Unsafe Shutdowns:                   8
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    12
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               49 Celsius
Thermal Temp. 1 Transition Count:   942
Thermal Temp. 2 Transition Count:   736
Thermal Temp. 1 Total Time:         12326
Thermal Temp. 2 Total Time:         1138

Error Information (NVMe Log 0x01, 16 of 255 entries)
No Errors Logged

Given it apparently didn’t found any issues, I did sudo smartctl -t long /dev/nvme0n1:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-23-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

NVMe device successfully opened

Use 'smartctl -a' (or '-x') to print SMART (and more) information

Then sudo smartctl -a /dev/nvme0n1 again:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-23-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       PCIe SSD
Serial Number:                      DE510741042500003431
Firmware Version:                   EHFM70.2
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            6479a7 892a2037fc
Local Time is:                      Wed Aug 28 22:52:27 2024 UTC
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     83 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x08):        No_ID_Reuse

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.00W       -        -    0  0  0  0        0       0
 1 +     3.00W       -        -    1  1  1  1        0       0
 2 +     1.90W       -        -    2  2  2  2        0       0
 3 -   0.0500W       -        -    3  3  3  3     5000    2500
 4 -   0.0050W       -        -    4  4  4  4     8000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         1
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        49 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    0%
Data Units Read:                    741,135 [379 GB]
Data Units Written:                 7,556,087 [3.86 TB]
Host Read Commands:                 33,528,308
Host Write Commands:                53,159,202
Controller Busy Time:               184
Power Cycles:                       18
Power On Hours:                     470
Unsafe Shutdowns:                   8
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    12
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               49 Celsius
Thermal Temp. 1 Transition Count:   942
Thermal Temp. 2 Transition Count:   736
Thermal Temp. 1 Total Time:         12326
Thermal Temp. 2 Total Time:         1138

Error Information (NVMe Log 0x01, 16 of 255 entries)
No Errors Logged
  1. sudo badblocks -b 4096 -c 1024 -s /dev/nvme0n1:
Checking for bad blocks (read-only test): done 
  1. I tried the following:
    sudo fsck -d -n /dev/nvme0n1p1:
fsck from util-linux 2.38.1

sudo fsck -d -n /dev/nvme0n1p2:

fsck from util-linux 2.38.1
e2fsck 1.47.0 (5-Feb-2023)
Warning!  /dev/nvme0n1p2 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/nvme0n1p2: clean, 94147/643376 files, 1030997/2576384 blocks

sudo fsck -d -n /dev/nvme0n1p3:

fsck from util-linux 2.38.1
e2fsck 1.47.0 (5-Feb-2023)
/dev/nvme0n1p3: clean, 94026/642112 files, 1029376/2576384 blocks

sudo fsck -d -n /dev/nvme0n1p4:

fsck from util-linux 2.38.1
e2fsck 1.47.0 (5-Feb-2023)
Warning!  /dev/nvme0n1p4 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/nvme0n1p4: clean, 199029/494895104 files, 71850818/1979574596 blocks

I’m not really sure how to interpret these results, should I try to do the live Linux USB + fsck -f?

Hey @jgmontoya

Apologies for the delay in getting back to you. Thanks for providing all of this. You did a great job!

It looks like everything’s okay with your drive actually. There shouldn’t be any need to run any further filesystem checks.

Could you open a support ticket via our live chat or send us a message on support@umbrel.com, quoting this problem (you can include the link to this page if you’d like) and also provide the serial number of your Umbrel Home?

1 Like