Preventive Measures to Help Obtain High Availability

High Availability of Your RAID Subsystem with IBM SQSI-2 Fast/Wide PCI-Bus RAID Adapter & IBM Fast/Wide Streaming RAID Adapter

Preventive Measures to Help Obtain High Availability

IBM recommends the following precautions in order to help obtain high availability of the RAID subsystem:

Define a Hot Spare

Defining a hot spare drive minimizes the length of time a server operates with degraded performance when a defunct drive occurs. The hot spare also allows the 'inconsistent' drive to be easily recognized in the event of a multiple defunct drive failure such that recovery procedures require much less technical expertise. The section below explains this advantage in greater detail:

Hot Spare Advantages 

When a system has a drive that becomes defunct, data is not written to this DDD drive, but data is written to the other drives in the array. Therefore that DDD becomes 'inconsistent' with the rest of the drives in the array. When multiple drives appear DDD, the first and most critical task is defining the 'inconsistent' drive correctly. The 'inconsistent' drive must be the last drive replaced since it requires rebuilding (and, if truly defective, may need physical replacement). If the 'inconsistent' drive is software replaced (See Software Replace vs. Physical Replace) first when a multiple DDD failure occurs, the 'inconsistent' data will be used to rebuild another drive. This eventually corrupts the other drives (and data) on the system.

However, when an HSP is defined, you are protected from rebuilding another drive from an 'inconsistent' drive. This is because of the way the RAID adapter marks the states of drives. When a system has a defined HSP, as soon as the HSP takes over for the DDD drive, the RAID Adapter marks the DDD drive in its configuration as the HSP drive. The adapter does not visually change the status of the drive to HSP. Yet if you perform a software replace or physical replace, the RAID Adapter starts the drive and changes the DDD state to HSP. The RAID Adapter does not allow this drive to be brought back to ONL status.

When the HSP takes over for the DDD drive, the HSP is rebuilt to replace the DDD drive. During the rebuilding of the HSP drive, it appears in the OFL state. The OFL state changes to ONL once this drive is completely rebuilt and fully operational for the DDD drive. The DDD drive remains DDD.

If a HSP is not defined or multiple drives appear DDD before the HSP is completely rebuilt, then this is not the case. You must read the RAID log to determine the 'inconsistent' drive. Then for the IBM SCSI-2 F/W PCI-Bus RAID Adapter and the IBM F/W Streaming RAID Adapter/A, you must ensure that the software replace option is selected on each drive bay in the correct order such that the 'inconsistent' drive is brought online last and rebuilt.

If a HSP drive was defined but did not complete the rebuild, then it is much easier to identify the 'inconsistent' drive. The 'inconsistent' drive will remain in OFL status.

When multiple drives appear defunct, as long as the logical drive is not in the OFL state, the user may select the Replace option to change the state of any of the DDD drives. Order does not matter with logical drives in the CRT state because the 'inconsistent' drive will appear as OFL or DDD to the user. If the logical drive is in the OFL state, the user may attempt to recover by identifying the 'inconsistent' drive, software replacing all drives except the 'inconsistent' drive, and then rebuilding the 'inconsistent' drive.

Back to  Jump to TOP-of-PAGE

Please see the LEGAL  -  Trademark notice.
Feel free - send a Email-NOTE  for any BUG on this page found - Thank you.