Recovering a Failed Physical Machine
Recover a physical machine (PM), or node, when it cannot boot or if it fails to become a PM in the everRun system. In some cases, the everRun Availability Console displays the state of a failed PM as Unreachable (Syncing/Evacuating).
To recover a PM, you must reinstall the everRun release that the PM has been running. Recovering a failed PM, though, is different from installing the software for the first time. The recovery preserves all data, but it re-creates the /boot and root file systems, re-installs the everRun system software, and attempts to connect to the existing system. (If you need to replace the physical PM hardware instead of recovering the system software, see Replacing Physical Machines, Motherboards, NICs, or RAID Controllers.)
To reinstall the system software, you can allow the system to automatically boot the replacement node from a temporary Preboot Execution Environment (PXE) server on the primary PM. As long as each PM contains a full copy of the most recently installed software kit (as displayed on the Upgrade Kits page of the everRun Availability Console), either PM can initiate the recovery of its partner PM with PXE boot installation. If needed, you can also manually boot the replacement node from DVD/USB installation media.
Use one of the following procedures based on the media you want to use for the installation, either PXE or DVD/USB installation.
Caution: The recovery procedure deletes any software installed in the host operating system of the PM and all PM configuration information entered before the recovery. After you complete this procedure, you must manually re-install all of your host-level software and reconfigure the PM to match your original settings.
Prerequisites:
- Determine which PM you need to recover.
- Check that a monitor and keyboard are connected to the PM.
- Check that Ethernet cables are connected from the PM your are replacing to the network or directly to the other PM, if the two everRun system PMs are in close proximity. The Ethernet cable should connect from the first embedded port on the PM you are recovering or from an option (that is, add-on or expansion) port if the PM does not have an embedded port.
-
If you want to use DVD or USB media to install the system software on the replacement PM, obtain installation software for the release that the PM has been running by using one of the following methods:
- Create a bootable USB medium on the Upgrade Kits page, as described in Creating a USB Medium with System Software.
- Download an install ISO from your authorized Stratus service representative.
-
Extract an install ISO into the current working directory from the most recently installed upgrade kit by executing a command similar to the following (x.x.x.x is the release number and nnn is the build number):
tar -xzvf everRun_upgrade-x.x.x.x-nnn.kit *.iso
If you download or extract an install ISO, save it or burn it to a DVD or USB medium. See Obtaining everRun Software.
To recover a PM (with PXE boot installation)
Use the following procedure to recover a PM by using PXE boot installation to reinstall the system software from the software kit on the primary PM.
- In the everRun Availability Console, click Physical Machines in the left-hand navigation panel.
- Select the appropriate PM (node0 or node1) and then click Work On, which changes the PM’s Overall State to Maintenance Mode and the Activity state to running (in Maintenance).
-
After the PM displays running (in Maintenance), click Recover.
-
When prompted to select the type of repair, click PXE PM Recover - Preserve Data.
Caution: It is important to select PXE PM Recover: Preserve data; otherwise, the installation process may delete data on the target PM.
-
Click Continue to begin the recovery process. The system reboots the target PM in preparation for the system software reinstallation.
-
As the PM reboots, enter the firmware (BIOS or UEFI) setup utility, and enable PXE boot (boot from network) for the priv0 NIC.
The recovery process continues with no user interaction, as follows:
- The target PM begins to boot from a PXE server that temporarily runs on the primary node.
- The target PM automatically starts the system software installation, which runs from a copy of the installation kit on the primary node.
- The installation process reinstalls the system software, while preserving all data.
You can monitor the progress of the software installation at the physical console of the target PM.
-
When the software installation is complete, the target PM reboots from the newly installed system software.
- As the target PM boots, you can view its activity on the Physical Machines page of the everRun Availability Console. The Activity column displays the PM as (in Maintenance) after the recovery is complete.
- If applicable, manually reinstall applications and any other host-level software, and reconfigure the PM to match your original settings.
- When you are ready to bring the target PM online, click Finalize to exit maintenance mode. Verify that both PMs return to the running state and that the PMs finish synchronizing.
Note: When the target PM exits maintenance mode, the system automatically disables the PXE server on the primary node that was used for the recovery process.
To recover a PM (with
DVD/USB installation)
Use the following procedure to recover a PM by reinstalling the system software from a DVD or USB medium.
- In the everRun Availability Console, click Physical Machines in the left-hand navigation panel.
- Select the appropriate PM (node0 or node1) and then click Work On, which changes the PM’s Overall State to Maintenance Mode and the Activity state to running (in Maintenance).
- After the PM displays running (in Maintenance), click Recover.
-
When prompted to select the type of repair, click DVD/USB PM Recover - Preserve Data.
Caution: It is important to select DVD/USB PM Recover: Preserve data; otherwise, the installation process may delete data on the target PM.
- Click Continue to begin the recovery process. The system shuts down the target PM in preparation for the system software reinstallation.
-
Insert the bootable media or mount the ISO image on the target PM, and then manually power on the PM.
-
As the target PM powers on, enter the firmware (BIOS or UEFI) setup utility and set the Optical Drive or USB media as the first boot device.
- Monitor the installation process at the physical console of the target PM.
- At the Welcome screen, use the arrow keys to select the country keyboard map for the installation.
-
At the Install or Recovery screen, select Recover PM, Join system: Preserving data and press Enter.
Caution: It is important to select Recover PM, Join system: Preserving data; otherwise, the installation process may delete data on the target PM.
-
The Select interface for private Physical Machine connection screen sets the physical interface to use for the private network. To use the first embedded port, use the arrow keys to select em1 (if it is not already selected), and then press F12 to save your selection and go to the next screen.
Notes:
- If you are not sure of which port to use, use the arrow keys to select one of the ports, and click the Identify button. The LED on the selected port will then flash for 30 seconds, allowing you to identify it. Since the LED may also flash due to activity on that network, Stratus recommends that you leave the cable disconnected during the identification process. Reconnect the cable immediately after identification is complete.
- If the system contains no embedded ports, select the first option interface instead.
-
The Select interface for managing the system (ibiz0) screen sets the physical interface to use for the management network. To use the second embedded port, use the arrow keys to select em2 (if it is not already selected), and then press F12 to save your selection and go to the next screen.
Note: If the system contains only one embedded port, select the first option interface. If the system contains no embedded ports, select the second option interface.
-
The Select the method to configure ibiz0 screen sets the management network for node1 as either a dynamic or static IP configuration. Typically, you set this as a static IP configuration, so use the arrow keys to select Manual configuration (Static Address) and press F12 to save your selection and go to the next screen. However, to set this as a dynamic IP configuration, select Automatic configuration via DHCP and press F12 to save your selection and go to the next screen.
- If you selected Manual configuration(Static Address) in the previous step, the Configure em2 screen appears. Enter the following information and press F12.
- IPv4 address
- Netmask
- Default gateway address
- Domain name server address
See your network administrator for this information.
Note: If you enter invalid information, the screen redisplays until you enter valid information.
- At this point, the software installation continues without additional prompts.
-
When the software installation is complete, the target PM reboots from the newly installed system software.
- As the target PM boots, you can view its activity on the Physical Machines page of the everRun Availability Console. The Activity column displays the PM as (in Maintenance) after the recovery is complete.
- If applicable, manually reinstall applications and any other host-level software, and reconfigure the PM to match your original settings.
- When you are ready to bring the target PM online, click Finalize to exit maintenance mode. Verify that both PMs return to the running state and that the PMs finish synchronizing.