ZFS Part 2: Disk Failure
Before I’m ready to trust ZFS I need to make sure I can replace a disk when it dies. With the setup described here, as a first experiment I removed the primary disk.
So, power down and remove the primary disk (ad4). Note that if you’re doing this on the Proliant system I mentioned, then you really should replace the drive mount (it is needed for cooling). Luckily I have a spare system so I just borrowed one.
Reboot. Comes up fine on the secondary disk without further intervention.
$ zpool status
pool: scratch
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
scratch ONLINE 0 0 0
gpt/scratch8 ONLINE 0 0 0
errors: No known data errors
pool: system
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
system DEGRADED 0 0 0
mirror DEGRADED 0 0 0
gpt/system8 ONLINE 0 0 0
gpt/system4 UNAVAIL 0 0 0 cannot open
errors: No known data errors
Note that the system pool is now degraded. How would we have known if we hadn’t checked? Well, turns out we missed something from the previous setup.
We should have put
daily_status_zfs_enable="YES"
daily_status_gmirror_enable="YES"
in /etc/periodic.conf. Then in the daily mail we’d see:
Checking status of zfs pools:
pool: system
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
system DEGRADED 0 0 0
mirror DEGRADED 0 0 0
gpt/system8 ONLINE 0 0 0
gpt/system4 UNAVAIL 0 0 0 cannot open
errors: No known data errors
Checking status of gmirror(8) devices:
Name Status Components
mirror/swap DEGRADED gpt/swap8
So remember, boys and girls, read your daily mails!
So far, so good. One disk failed, the system came back up without intervention, and would have alerted us in daily mails had we configured it correctly (of course it now is). So what happens if we put the disk back in? Since we’ve modified the other disk in the meantime, we’d hope that would get reconciled. Let’s see…
Power down and replace the missing disk, reboot.
Now we see
$ zpool status
pool: scratch
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
scratch ONLINE 0 0 0
gpt/scratch8 ONLINE 0 0 0
errors: No known data errors
pool: system
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Sat Mar 26 10:48:56 2011
config:
NAME STATE READ WRITE CKSUM
system ONLINE 0 0 0
mirror ONLINE 0 0 0
gpt/system8 ONLINE 0 0 0
gpt/system4 ONLINE 0 0 0 345K resilvered
errors: No known data errors
$ gmirror status
Name Status Components
mirror/swap COMPLETE gpt/swap4
gpt/swap8
and there we are, back to where we started. But suppose the disk had really failed, then what? See the next exciting installment!
[...] LinksBen Laurie blathering « ZFS Part 2: Disk Failure [...]
Pingback by Links » ZFS Part 3: Replacing Dead Disks — 27 Mar 2011 @ 16:39