Oracle Solaris 11 (11/11) – Replacing a Bad Drive (ZFS)

This is just a quick post to show the process I took to replace a degraded drive from my pool.

First; lets have a look at the pool showing the degraded drive.

damox@Starburst:~$ zpool status tank
[...]
        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          raidz2-0   ONLINE       0     0     0
            c6t12d0  ONLINE       0     0     0
            c6t11d0  ONLINE       0     0     0
            c6t13d0  ONLINE       0     0     0
            c6t14d0  ONLINE       0     0     0
            c4t25d0  ONLINE       0     0     0
            c4t26d0  ONLINE       0     0     0
            c4t27d0  ONLINE       0     0     0
            c4t28d0  ONLINE       0     0     0
            c3t17d0  ONLINE       0     0     0
            c3t18d0  ONLINE       0     0     0
          raidz2-1   DEGRADED     0     0     0
            c3t19d0  ONLINE       0     0     0
            c3t20d0  ONLINE       0     0     0
            c6t35d0  ONLINE       0     0     0
            c6t36d0  ONLINE       0     0     0
            c6t37d0  DEGRADED     0     0     0 too many errors
            c6t38d0  ONLINE       0     0     0
            c4t29d0  ONLINE       0     0     0
            c4t30d0  ONLINE       0     0     0
            c4t31d0  ONLINE       0     0     0
            c4t32d0  ONLINE       0     0     0
        cache
          c3t33d0    ONLINE       0     0     0
[...]

We identify the problem drive (in this case a drive that has too many errors + failed to resilver) and set it to OFFLINE so that no further read/write attempts are made to it.

In this case c6t37d0 is degraded and will be set to OFFLINE.

damox@Starburst:~$  zpool offline tank c6t37d0
[...]
        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          raidz2-0   ONLINE       0     0     0
            c6t12d0  ONLINE       0     0     0
            c6t11d0  ONLINE       0     0     0
            c6t13d0  ONLINE       0     0     0
            c6t14d0  ONLINE       0     0     0
            c4t25d0  ONLINE       0     0     0
            c4t26d0  ONLINE       0     0     0
            c4t27d0  ONLINE       0     0     0
            c4t28d0  ONLINE       0     0     0
            c3t17d0  ONLINE       0     0     0
            c3t18d0  ONLINE       0     0     0
          raidz2-1   DEGRADED     0     0     0
            c3t19d0  ONLINE       0     0     0
            c3t20d0  ONLINE       0     0     0
            c6t35d0  ONLINE       0     0     0
            c6t36d0  ONLINE       0     0     0
            c6t37d0  OFFLINE      0     0     0
            c6t38d0  ONLINE       0     0     0
            c4t29d0  ONLINE       0     0     0
            c4t30d0  ONLINE       0     0     0
            c4t31d0  ONLINE       0     0     0
            c4t32d0  ONLINE       0     0     0
        cache
          c3t33d0    ONLINE       0     0     0
[...]

Next I turned off the server and physically replaced the drive (after some pain physically located the drive. Note to self: label you drives/ports next time chump).

Using the ‘format’ command we get a list of available drives, knowing which drive is which I can identify the drive to be replacing the previously degraded one.

In this case c6t45d0 is the new drive to replace c6t37d0 (degraded).

root@Starburst:~# format
[...]
AVAILABLE DISK SELECTIONS:
       0. c3t0d0 <ATA-WDC WD1500HLFS-0-4V02 cyl 18238 alt 2 hd 255 sec 63>
          /pci@0,0/pci10de,815@2/pci1014,394@0/sd@0,0
       [...]
      20. c6t38d0 <ATA-WDC WD20EARX-00P-AB51-1.82TB>
          /pci@0,0/pci10de,377@18/pci1014,394@0/sd@26,0
      21. c6t45d0 <ATA-WDC WD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252>
          /pci@0,0/pci10de,377@18/pci1014,394@0/sd@2d,0
[...]

Using the replace command (zpool replace [-f] pool old_device [new_device]) we set c6t45d0 to replace c6t37d0.

Then we checkout the status of the pool to see the effect.

root@Starburst:~# zpool replace tank c6t37d0 c6t45d0
root@Starburst:~# zpool status tank
[...]
        NAME             STATE     READ WRITE CKSUM
        tank             DEGRADED     0     0     0
          raidz2-0       ONLINE       0     0     0
            c6t12d0      ONLINE       0     0     0
            c6t11d0      ONLINE       0     0     0
            c6t13d0      ONLINE       0     0     0
            c6t14d0      ONLINE       0     0     0
            c4t25d0      ONLINE       0     0     0
            c4t26d0      ONLINE       0     0     0
            c4t27d0      ONLINE       0     0     0
            c4t28d0      ONLINE       0     0     0
            c3t17d0      ONLINE       0     0     0
            c3t18d0      ONLINE       0     0     0
          raidz2-1       DEGRADED     0     0     0
            c3t19d0      ONLINE       0     0     0
            c3t20d0      ONLINE       0     0     0
            c6t35d0      ONLINE       0     0     0
            c6t36d0      ONLINE       0     0     0
            replacing-4  DEGRADED     0     0     0
            c6t37d0      OFFLINE      0     0     0
            c6t45d0      ONLINE       0     0     0 (resilvering)
            c6t38d0      ONLINE       0     0     0
            c4t29d0      ONLINE       0     0     0
            c4t30d0      ONLINE       0     0     0
            c4t31d0      ONLINE       0     0     0
            c4t32d0      ONLINE       0     0     0
        cache
          c3t33d0        ONLINE       0     0     0
[...]

After the resilvering process is complete (took me roughly 12 hours) we check the status again to see the effect.

root@Starburst:~# zpool status tank
[...]
        NAME         STATE     READ WRITE CKSUM
        tank         ONLINE       0     0     0
          raidz2-0   ONLINE       0     0     0
            c6t12d0  ONLINE       0     0     0
            c6t11d0  ONLINE       0     0     0
            c6t13d0  ONLINE       0     0     0
            c6t14d0  ONLINE       0     0     0
            c4t25d0  ONLINE       0     0     0
            c4t26d0  ONLINE       0     0     0
            c4t27d0  ONLINE       0     0     0
            c4t28d0  ONLINE       0     0     0
            c3t17d0  ONLINE       0     0     0
            c3t18d0  ONLINE       0     0     0
          raidz2-1   ONLINE       0     0     0
            c3t19d0  ONLINE       0     0     0
            c3t20d0  ONLINE       0     0     0
            c6t35d0  ONLINE       0     0     0
            c6t36d0  ONLINE       0     0     0
            c6t45d0  ONLINE       0     0     0
            c6t38d0  ONLINE       0     0     0
            c4t29d0  ONLINE       0     0     0
            c4t30d0  ONLINE       0     0     0
            c4t31d0  ONLINE       0     0     0
            c4t32d0  ONLINE       0     0     0
        cache
          c3t33d0    ONLINE       0     0     0
[...]

You could scrub the pool now to check for errors (which you should be doing anyway – “If you have consumer-quality drives, consider a weekly scrubbing schedule. If you have datacenter-quality drives, consider a monthly scrubbing schedule. — ZFS Best Practices Guide

2 thoughts on “Oracle Solaris 11 (11/11) – Replacing a Bad Drive (ZFS)

Leave a Reply

Your email address will not be published. Required fields are marked *