Responding to a Failed Logical Disk
When everRun software detects a missing or broken logical disk, it displays a failed logical disk alert on the DASHBOARD page of the everRun Availability Console. (For examples of alerts, see Logical Disk Management.) You can also view the alert on the ALERT HISTORY page. The everRun Availability Console continues to display the alert until you respond to the problem using one of the following methods, as appropriate for your situation:
- If a physical disk has been pulled, reinsert the appropriate physical disk. In this case, the physical machine restores the disk and you may need to use RAID controller software to complete the logical disk restoration.
-
If a logical disk is broken or missing, you can attempt to use RAID controller software to recover it. If you are able to use RAID controller software to restore the logical disk to service, the everRun software will detect the restored logical disk and start using its data
- If a logical disk is broken or missing, and you cannot recover the logical disk using RAID controller software (for example, a failed physical disk needs to be replaced), click the Repair button in the masthead to complete the repair. After clicking the Repair button, the everRun software:
- Dismisses the alert.
- Evacuates all failed logical disks.
- Removes all failed logical disks from their storage groups.
- Attempts to repair any volumes that had been using the failed logical disks.
Caution:
- Clicking the Repair button removes all data on failed logical disks.
- If you attempt to recover a missing or failed logical disk with the Repair button in the masthead of the everRun Availability Console, the system may be slow to repair the disk. Although the system successfully removes the failed logical disk from its storage group, it may be slow to migrate the data from the failed disk to other disks in the storage group. The Alerts page may continue to report that the logical disk is not present, that volumes have failed, and that storage is not fault tolerant. Also, the Volumes page may continue to show volumes in the broken (
) state. If this condition persists, contact your authorized Stratus service representative for assistance.
- Repairing storage causes virtual machines (VMs) that are using failed logical disks to become simplex until repair is complete.
-
Systems configured for UEFI will only boot from the logical disk that the everRun software was originally installed on.
- In some legacy BIOS configurations, if you need to repair a logical disk that is the boot disk, you may need to reconfigure the RAID controller to boot from one of the remaining logical disks. Any logical disk that is not affected by the failed disk is able to boot the server. The everRun software mirrors the boot files for each node, in order to maximize overall availability. However, some systems may be able to boot from only the predefined boot logical disk in the RAID controller, and may be unable to boot from an alternate logical disk, if the predefined boot logical disk is present but not bootable. After the node has recovered and the logical disk with the replacement drive has been brought up to date, you should restore the boot device to the original value in the RAID controller.
To repair a failed logical disk
- Click the Repair button that appears in the masthead of the everRun Availability Console.
-
Click Yes in the Confirm message box if you want to continue with the repair.
After you click the Repair button, the everRun software attempts to repair all broken volumes by migrating data to other logical disks. When other logical disks have enough space for the data, the everRun software can successfully complete the repair. When other logical disks do not have enough space for the data, the everRun software generates the alert Not enough space for repair. In this case, you need to add more storage to the storage group by creating new logical disks or by deleting some existing volumes.
When enough space for the data exists, the everRun software automatically re-mirrors broken volumes.
After the repair is complete, use RAID controller software to remove the failed logical disk and to create a new logical disk. everRun software automatically recognizes the new logical disk and brings it into service if the disk does not contain data. If the disk contains data, the DASHBOARD displays the message Logical Disk – n on PM noden is foreign and should be activated or removed. To activate the logical disk, see Activating a New Logical Disk.