Categories
Linux Proxmox Servers

How to Replace a Failed ZFS Pool Drive in Proxmox

Last night when I was updating my Proxmox server I restarted it. But I wasn’t able to connect to it after five minutes, so I connected a monitor to it and saw that a hard drive wasn’t being detected. I Identified the failed drive, replaced it, and powered the server up.

When Proxmox started up, I opened a shell on the server from the web interface and ran the following command.

zpool status

Which then displayed the following output.

pool: storage
state: ONLINE
scan: scrub repaired 0B in 0 days 01:04:34 with 0 errors on Sun Aug 8 01:28:36 2021

config:

NAME            STATE       READ WRITE CKSUM
TB_Drives       ONLINE       0     0     0
   mirror-0     ONLINE       0     0     0
      ata-WDC_********-******_WD-**********  
                ONLINE       0     0     0
      ata-WDC_********-******_WD-**********    
                ONLINE       0     0     0

errors: No known data errors

pool: os
state: DEGRADED
status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.

action: Online the device using 'zpool online' or replace the device with 'zpool replace'.

scan: scrub repaired 0B in 0 days 01:04:34 with 0 errors on Sun Aug 8 01:28:36 2021

config:

NAME         STATE     READ WRITE CKSUM
rpool        ONLINE       0     0     0
  mirror-0   ONLINE       0     0     0
     475234859233716278394
             REMOVED      0     0     0
     ata-SSD_**********_WD-*******  
             ONLINE       0     0     0

errors: No known data errors

To replace the drive, we’ll need the drive path of the new one to replace the old one. So go back to the Proxmox admin dashboard. Click on the server from the left side menu, then click on “Disks”. You’ll see a list of disks plugged into your Proxmox server. Find the new hard drive you added and under the “Device” column, copy the value for the hard drive. It should be something like /dev/sdb.

After you have the value for the new hard drive, we’ll run the following command to add the new hard drive to the pool.

zpool replace -f (pool name) (Old HD ID) (New HD ID)

Using the command above, we’ll input the old and new hard drive identifiers. The command below is an example of what the command should look like using the hard drive IDs from this blog post.

zpool replace -f os 475234859233716278394 /dev/sdb

This will start the process of replacing the bad drive with the new drive. The time it will take depends on the size of the drive and how much data it needs to sync to the new drive. To check the status of the process run the following command.

zpool status

This will show output similar to the output below. If you check the status portion it will state the time reaming as well as how much data it has already synced.

pool: storage
state: ONLINE
scan: scrub repaired 0B in 0 days 01:04:34 with 0 errors on Sun Aug 8 01:28:36 2021

config:

NAME            STATE       READ WRITE CKSUM
TB_Drives       ONLINE       0     0     0
   mirror-0     ONLINE       0     0     0
      ata-WDC_********-******_WD-**********  
                ONLINE       0     0     0
      ata-WDC_********-******_WD-**********    
                ONLINE       0     0     0

errors: No known data errors

pool: os
state: ONLINE
scan: resilvered 107.9G in 0 days 00:35:04 with 0 errors on Wed Sep 5 18:05:03 2021

config:

NAME         STATE     READ WRITE CKSUM
rpool        ONLINE       0     0     0
  mirror-0   ONLINE       0     0     0
    sdb      ONLINE       0     0     0
     ata-SSD_**********_WD-*******  
             ONLINE       0     0     0

errors: No known data errors

After that, just let the server sync and when it’s done, the pool will show an online state and the drives should all show their state as online too.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.