High Availability Using the IBM ServeRAID Adapter

IBM recommends the following precautions in order to help obtain high availability of the RAID subsystem:

Define a Hot Spare

Defining a hot spare drive minimizes the length of time a server operates with degraded performance when a defunct drive occurs. The hot spare also allows the 'inconsistent' drive to be easily recognized in the event of a multiple defunct drive failure such that recovery procedures require much less technical expertise. The section below explains this advantage in greater detail:

Hot Spare Advantages 

When a system has a drive that becomes defunct, data is not written to this DDD drive, but data is written to the other drives in the array. Therefore that DDD drive becomes 'inconsistent' with the rest of the drives in the array. When multiple drives appear DDD, the first and most critical task is defining the 'inconsistent' drive correctly. The 'inconsistent' drive must be the last drive replaced since it requires rebuilding (and, if truly defective, may need physical replacement). If the 'inconsistent' drive is software replaced (See Software Replace vs. Physical Replace) first when a multiple DDD failure occurs, the 'inconsistent' data will he used to rebuild another drive. This eventually corrupts the other drives (and data) on the system.

However, when an HSP is defined, you are protected from rebuilding another drive from an 'inconsistent' drive. This is because of the way the RAID adapter marks the states of drives. When a system has a defined HSP, as soon as the IISP takes over for the DDD drive, the RAID adapter marks the DDD drive as a defunct hot spare (DHS) drive in its configuration. If you perform a software replace or physical replace of this DHS drive, the RAID adapter starts the DHS drive and changes the state from DHS to HSP. The RAID adapter does not allow this drive to be brought back to ONL status.

When the HSP takes over for the DDD drive, the HSP is rebuilt to replace the DDD drive. During the rebuilding of the HSP drive, it appears in the RBL state. The RBL state changes to ONL once this drive is completely rebuilt and fully operational for the now DHS drive.

If an HSP is not defined and multiple drives appear DDD, then determination of the 'inconsistent' drive is more difficult. You must now read the RAID log , generated by IPSMON, to determine the 'inconsistent' drive. The 'inconsistent' drive is the drive which goes DDD first. To examine this process in a little more depth, consider the following points. When the first drive appears DDD, the operating system remains operational with the remaining drives. It writes to all the other drives in the array except for the first DDD drive. When the second DDD occurs, the operating system is no longer functional and does not write to any drives. If writing to the RAID log, generated by IPSMON, can only occur while the operating system is operational, the first DDD drive must by default be the 'inconsistent' drive. To rectify this situation, you must change the 'consistent' drives from DDD to ONL by using the Set Device State option and ensure that the 'inconsistent' drive is the one you try to rebuild.

If a HSP drive is defined but did not complete the rebuild, then it is much easier to identify the 'inconsistent' drive. The 'inconsistent' drive remains in RBL status. The DDD drive will appear with a DHS status.

Install and Use NetFinity Manager

You should install NetFinity Manager 5.0 or greater in order to monitor the RAID array remotely. Netfinity Manager can be used to schedule data scrubbing to occur at any time of the day, so synchronization of the RAID array can be scheduled for off-peak hours and will not require user input to get things started. With NetFinity services installed at the server, and the NetFinity Manager installed on a workstation, the RAID array can be monitored, and even synchronized, from a remote location. The system can also be configured to send alert messages regarding the RAID subsystem over the network to the workstation. You can even setup NetFinity Manager to page someone, e.g., the network administrator or a service technician, if a certain alert condition is reached. NetFinity Manager can also perform many other functions such as monitoring processor utilization, critical file monitoring and detecting installed software across the network. Netfinity is also used to capture PFA alerts from hard files and then send system alerts to the appropriate parties. In order to use Netfinity 5.0 to schedule data scrubbing, please download NF50RAID.EXE from http://www.us.pc.ibm.com/files.html This file contains updated Netfinity program files which are required for scheduling data scrubbing on controllers with the write policy set to write-back cache. When installed with the NetFinity Manager code the following operating systems are affected: OS/2, WINNT, and WIN95.

Data Scrub Drives Weekly

One of the best ways to recognize potential disk media problems in advance and correct them belbre a failure occurs, is to Data Scrub (This is done in the background by the ServeRAIDIl Adapter with firmware 2.30 or higher). Sector media errors can be identified and corrected simply by forcing all data sectors in the array to be accessed through Data Scrubbing. Data Scrubbing checks all data sectors in the array and should be performed weekly. With the IBM ServeRAID and ServeRAIDil Adapters, an easy process used to accomplish Data Scrubbing is synchronization. Data Scrubbing will force all sectors of the drives contained in the array to be read in the background while allowing concurrent user disk activity. Netfinity Manager 5.0 will allow you to automatically schedule synchronization from either the server or a remote manager. Netfinity Manager 5.0 can be obtained at no additional charge by customers that have purchased an IBM server that ships with ServerGuide. If the customer has another type of scheduler such as the AT scheduler built into Windows NT, then the IBM ServeRAID Adapter's IPSSEND command line titility may be used to allow the customer to schedule Data Scrubbing without Netfinity Manager installed. The IPSSEND utility is available on the ServeRAID Supplemental Diskette.

Apply All Updates

You should apply all updates regarding RAID. Check the IBM Server web site at http://www.us.pc.ibm.com/server/server.html or call the HelpCenter for up-to-date information,

