Veritas Disk Replacement

How to Replace the faulty disk in VxVM

In this post we will discussing about most common issue of replacing the faulty VxVM disk in UNIX server.
Every Unix System Admin should gone through this situation where they removing/replacing the faulty VxVM disk. Here we can see the procedure for replacing the faulty disk through “vxdiskadm” command, however we can also do the same in command line mode which is preferred.

Assume that diskgroup “karridg” is having one faultydisk karridg04 (c3t9d0s2) which needs to be replaced. Let we can do the high level plan before doing the replacement.

HIGH LEVEL PLAN:
Step1: Take the Backup of System and Disk Configuration.(Recommended to take cfg2html or explorer)
Step2: Remove the disk from VxVM level by using vxdiskadm utility
Step3: Unconfigure the disk from Os level by using cfgadm command.
Step4: Request to change the faulty disk.
Step5: Configure the disk from Os level by using cfgadm and devfsadm command.
Step6: Replace the disk from VxVM level by using vxdiskadm utility.
Step7: Start the VOLUME and Mount the same.

Let we can start the activity after taking the valid configuration backup.Below output is confirming that karridg04(c3t9d0s2) is failed status.

root@karri # vxdisk list
 DEVICE TYPE DISK GROUP STATUS
 c1t0d0s2 sliced rootdisk rootdg online
 c1t1d0s2 sliced rootmirr rootdg online
 c3t8d0s2 sliced karridg03 karridg online
 c3t10d0s2 sliced - - online
 c3t11d0s2 sliced karridg01 karridg online
 c3t12d0s2 sliced karridg02 karridg online
 - - karridg04 karridg failed was:c3t9d0s2
 root@karri #
 Remove the failed disk from VxVM with using vxdiskadm command.
 root@karri # vxdiskadm

Volume Manager Support Operations
 Menu: VolumeManager/Disk

1 Add or initialize one or more disks
 2 Encapsulate one or more disks
 3 Remove a disk
 4 Remove a disk for replacement
 5 Replace a failed or removed disk
 6 Mirror volumes on a disk
 7 Move volumes from a disk
 8 Enable access to (import) a disk group
 9 Remove access to (deport) a disk group
 10 Enable (online) a disk device
 11 Disable (offline) a disk device
 12 Mark a disk as a spare for a disk group
 13 Turn off the spare flag on a disk
 14 Unrelocate subdisks back to a disk
 15 Exclude a disk from hot-relocation use
 16 Make a disk available for hot-relocation use
 17 Prevent multipathing/Suppress devices from VxVM's view
 18 Allow multipathing/Unsuppress devices from VxVM's view
 19 List currently suppressed/non-multipathed devices
 20 Change the disk naming scheme
 21 Get the newly connected/zoned disks in VxVM view
 list List disk information
 ? Display help about menu
 ?? Display help about the menuing system
 q Exit from menus

Select an operation to perform: 4

Remove a disk for replacement
 Menu: VolumeManager/Disk/RemoveForReplace

Use this menu operation to remove a physical disk from a disk
 group, while retaining the disk name. This changes the state
 for the disk name to a "removed" disk. If there are any
 initialized disks that are not part of a disk group, you will be
 given the option of using one of these disks as a replacement.

Enter disk name [,list,q,?] list

Disk group: rootdg

DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE

dm rootdisk c1t0d0s2 sliced 10175 143339136 -
 dm rootmirr c1t1d0s2 sliced 10175 143339136 -

Disk group: karridg

DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE

dm karridg01 c3t11d0s2 sliced 9919 143328960 -
 dm karridg02 c3t12d0s2 sliced 9919 143328960 -
 dm karridg03 c3t8d0s2 sliced 9919 143328960 -
 dm karridg04 - - - - NODEVICE

Enter disk name [,list,q,?] karridg04

The following volumes will be disabled as a result of this
 operation:

karrivol

These volumes will require restoration from backup.

Are you sure you want do do this? [y,n,q,?] (default: n) y

The requested operation is to remove disk karridg04 from disk group
 karridg. The disk name will be kept, along with any volumes using
 the disk, allowing replacement of the disk.

Select "Replace a failed or removed disk" from the main menu
 when you wish to replace the disk.

Continue with operation? [y,n,q,?] (default: y) y

Removal of disk karridg04 completed successfully.

Remove another disk? [y,n,q,?] (default: n) n

Volume Manager Support Operations
 Menu: VolumeManager/Disk

1 Add or initialize one or more disks
 2 Encapsulate one or more disks
 3 Remove a disk
 4 Remove a disk for replacement
 5 Replace a failed or removed disk
 6 Mirror volumes on a disk
 7 Move volumes from a disk
 8 Enable access to (import) a disk group
 9 Remove access to (deport) a disk group
 10 Enable (online) a disk device
 11 Disable (offline) a disk device
 12 Mark a disk as a spare for a disk group
 13 Turn off the spare flag on a disk
 14 Unrelocate subdisks back to a disk
 15 Exclude a disk from hot-relocation use
 16 Make a disk available for hot-relocation use
 17 Prevent multipathing/Suppress devices from VxVM's view
 18 Allow multipathing/Unsuppress devices from VxVM's view
 19 List currently suppressed/non-multipathed devices
 20 Change the disk naming scheme
 21 Get the newly connected/zoned disks in VxVM view
 list List disk information
 ? Display help about menu
 ?? Display help about the menuing system
 q Exit from menus

Select an operation to perform: q

Goodbye.
 Now we can see the failed disk status as "Removed"
 root@karri # vxdisk list
 DEVICE TYPE DISK GROUP STATUS
 c1t0d0s2 sliced rootdisk rootdg online
 c1t1d0s2 sliced rootmirr rootdg online
 c3t8d0s2 sliced karridg03 karridg online
 c3t9d0s2 sliced - - error
 c3t10d0s2 sliced - - online
 c3t11d0s2 sliced karridg01 karridg online
 c3t12d0s2 sliced karridg02 karridg online
 - - karridg04 karridg removed was:c3t9d0s2
 Once we removed the disk from VxVM level, we have to remove the faulty disk from OS level by using cfgadm -c unconfigure .
 root@karri # cfgadm -c unconfigure c3::dsk/c3t9d0
 root@karri #
 Once its done, we have to replace the faulty disk physically and configure the disk in OS level.
 root@karri # cfgadm -c configure c3::dsk/c3t9d0
 root@karri #
 root@karri # devfsadm -c disk
 root@karri # echo|format|grep -i c3t9d0
 3. c3t9d0 SUN72G cyl 14087 alt 2 hd 24 sec 424
 root@karri #
 Now the disk is available in OS level, we have to get the disk into VxVM control now.
 root@karri # vxdctl enable
 root@karri # vxdiskadm

Volume Manager Support Operations
 Menu: VolumeManager/Disk

1 Add or initialize one or more disks
 2 Encapsulate one or more disks
 3 Remove a disk
 4 Remove a disk for replacement
 5 Replace a failed or removed disk
 6 Mirror volumes on a disk
 7 Move volumes from a disk
 8 Enable access to (import) a disk group
 9 Remove access to (deport) a disk group
 10 Enable (online) a disk device
 11 Disable (offline) a disk device
 12 Mark a disk as a spare for a disk group
 13 Turn off the spare flag on a disk
 14 Unrelocate subdisks back to a disk
 15 Exclude a disk from hot-relocation use
 16 Make a disk available for hot-relocation use
 17 Prevent multipathing/Suppress devices from VxVM's view
 18 Allow multipathing/Unsuppress devices from VxVM's view
 19 List currently suppressed/non-multipathed devices
 20 Change the disk naming scheme
 21 Get the newly connected/zoned disks in VxVM view
 list List disk information
 ? Display help about menu
 ?? Display help about the menuing system
 q Exit from menus

Select an operation to perform: 5

Replace a failed or removed disk
 Menu: VolumeManager/Disk/ReplaceDisk

Use this menu operation to specify a replacement disk for a disk
 that you removed with the "Remove a disk for replacement" menu
 operation, or that failed during use. You will be prompted for
 a disk name to replace and a disk device to use as a replacement.
 You can choose an uninitialized disk, in which case the disk will
 be initialized, or you can choose a disk that you have already
 initialized using the Add or initialize a disk menu operation.

Select a removed or failed disk [,list,q,?] list

Disk group: rootdg

DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
 Disk group: karridg

DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE

dm karridg04 - - - - REMOVED
 Select a removed or failed disk [,list,q,?] karridg04

Select disk device to initialize [

,list,q,?] list

DEVICE DISK GROUP STATUS
 c1t0d0 rootdisk rootdg online
 c1t1d0 rootmirr rootdg online
 c3t8d0 karridg03 karridg online
 c3t9d0 - - error
 c3t10d0 - - online
 c3t11d0 karridg01 karridg online
 c3t12d0 karridg02 karridg online

Select disk device to initialize [

,list,q,?] c3t9d0

The following disk device has a valid VTOC, but does not appear to have
 been initialized for the Volume Manager. If there is data on the disk
 that should NOT be destroyed you should encapsulate the existing disk
 partitions as volumes instead of adding the disk as a new disk.
 Output format: [Device_Name]

c3t9d0

Encapsulate this device? [y,n,q,?] (default: y) n

c3t9d0

Instead of encapsulating, initialize? [y,n,q,?] (default: n) y

The requested operation is to initialize disk device c3t9d0 and
 to then use that device to replace the removed or failed disk
 karridg04 in disk group karridg.

Continue with operation? [y,n,q,?] (default: y)

Use a default private region length for the disk?
 [y,n,q,?] (default: y)

Replacement of disk karridg04 in group karridg with disk device
 c3t9d0 completed successfully.

Replace another disk? [y,n,q,?] (default: n)
 Checking the status
 root@karri # vxdisk list
 DEVICE TYPE DISK GROUP STATUS
 c1t0d0s2 sliced rootdisk rootdg online
 c1t1d0s2 sliced rootmirr rootdg online
 c3t8d0s2 sliced karridg03 karridg online
 c3t9d0s2 sliced karridg04 karridg online
 c3t10d0s2 sliced - - online
 c3t11d0s2 sliced karridg01 karridg online
 c3t12d0s2 sliced karridg02 karridg online
 root@karri #
 We have successfully replaced the faulty disk. however we have to check the VOLUME status. Below output "karrivol" is disabled status.
 root@karri # vxprint -hvtg karridg
 V NAME RVG KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
 PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
 SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
 SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
 DC NAME PARENTVOL LOGVOL
 SP NAME SNAPVOL DCO

v karrivol - DISABLED ACTIVE 573313024 SELECT - fsgen
 pl karri-01 karrivol DISABLED RECOVER 573315840 CONCAT - RW
 sd karridg03-01 karri-01 karridg03 0 143328960 0 c3t8d0 ENA
 sd karridg04-01 karri-01 karridg04 0 143328960 143328960 c3t9d0 ENA
 sd karridg01-01 karri-01 karridg01 0 143328960 286657920 c3t11d0 ENA
 sd karridg02-01 karri-01 karridg02 0 143328960 429986880 c3t12d0 ENA
 root@karri #
 I tried below steps make the volume active status.
 root@karri # vxrecover -s karrivol
 root@karri # vxprint -hvtg karridg
 V NAME RVG KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
 PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
 SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
 SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
 DC NAME PARENTVOL LOGVOL
 SP NAME SNAPVOL DCO

v karrivol - DISABLED ACTIVE 573313024 SELECT - fsgen
 pl karri-01 karrivol DISABLED RECOVER 573315840 CONCAT - RW
 sd karridg03-01 karri-01 karridg03 0 143328960 0 c3t8d0 ENA
 sd karridg04-01 karri-01 karridg04 0 143328960 143328960 c3t9d0 ENA
 sd karridg01-01 karri-01 karridg01 0 143328960 286657920 c3t11d0 ENA
 sd karridg02-01 karri-01 karridg02 0 143328960 429986880 c3t12d0 ENA
 root@karri # vxtask list
 TASKID PTID TYPE/STATE PCT PROGRESS
 root@karri # vxvol -g karridg startall
 vxvm:vxvol: ERROR: Volume karrivol has no CLEAN or non-volatile ACTIVE plexes
 root@karri #
 Then I follow the below steps in order to make the volume active status.
 root@karri # vxmend -g karridg fix stale karri-01
 root@karri # vxmend -g karridg fix clean karri-01
 root@karri # vxvol -g karridg start karrivol
 root@karri #
 root@karri # vxprint -hvtg karridg
 V NAME RVG KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
 PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
 SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
 SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
 DC NAME PARENTVOL LOGVOL
 SP NAME SNAPVOL DCO

v karrivol - ENABLED ACTIVE 573313024 SELECT - fsgen
 pl karri-01 karrivol ENABLED ACTIVE 573315840 CONCAT - RW
 sd karridg03-01 karri-01 karridg03 0 143328960 0 c3t8d0 ENA
 sd karridg04-01 karri-01 karridg04 0 143328960 143328960 c3t9d0 ENA
 sd karridg01-01 karri-01 karridg01 0 143328960 286657920 c3t11d0 ENA
 sd karridg02-01 karri-01 karridg02 0 143328960 429986880 c3t12d0 ENA
 root@karri #
 Then I tried to mount the Volume, but I got below errors
 root@karri # mount /karri
 mount: /dev/vx/dsk/karridg/karrivol is already mounted, /karri is busy,
 or the allowable number of mount points has been exceeded
 root@karri # mount -v|grep -i /karri
 root@karri #
 Then i did some breakfix in order to mount the volume
 root@karri # mv /karri /karri_old
 root@karri # mkdir /karri
 root@karri # mount /karri
 root@karri # df -k|grep -i karri
 Filesystem kbytes used avail capacity Mounted on
 /dev/vx/dsk/karridg/karrivol
 282176390 29435764 249918863 11% /karri
 root@karri #