The following section explains the Drive Protection Features in greater detail. You may wish to skip this section and proceed to the Procedures for Synchronization and Data Scrubbing in the next section. Disk drives manufactured today can store over 10 times the data of drives manufactured just 5 years ago. In order to achieve these capacities, the read/write heads must fly lower, the media data rates must be much higher, and the data tracks must be located much closer together on the platters than in older drives. All these changes reduce the margin for errors and make the drives more sensitive to damage due to handling, particularly in the case of hot-swappable drives which receive far more handling than those which are hard mounted inside a system. Handling damage can occur if a drive is dropped, even if dropped less than one inch to a surface. For these reasons, it is important for drives to recognize and recover from certain types of drive problems.

Remapping Bad Sectors 

When data is first written to the hard drive, the write process will check the drive to ensure the media quality is good enough to safely store the data. Minor damage that shows up over time is what is commonly called sector media errors. Sector media errors usually only affect a single 5 12 byte block of data on the disk. This sector can be marked as bad and the location reassigned or 'remapped' to a spare sector of the drive. Most drives reserve one spare sector per track of data and can perform this operation automatically.

Error Correction Code  (ECC)

By remapping bad sectors, the drive avoids potential problems by using only 'reliable' sections of the disk. What happens if a media problem develops after the data has been written? When an area of the disk is being read, most drives can correct minor sector media errors automatically by using error correction code (FCC) information stored along with the data and then used in rewriting the data on the disk. If the sector is badly damaged and the data can not be reliably rewritten to the same spot, the drive will remap the data to a spare sector on the disk. If the sector is very badly damaged, the drive may not be able to recreate the data automatically with the ECC. If no other protection (such as RAID) is in place, the system will report a read failure and the data will be lost. These lost data areas are typically reported to the user via operating system messages.

Predictive Failure Analysis 

As with any electrical/mechanical device, there are two basic failure types:

The first type of failure is the gradual performance degradation of components that can ultimately lead to a catastrophic drive failure. Predictive Failure Analysis has been developed to monitor performance of drives, analyze data from periodic internal measurements, and recommend replacement when specific thresholds are exceeded. The data from periodic internal measurements is collected when actual accesses of the data sectors occurs. Data Scrubbing, which forces all data sectors to be read, provides more data to improve the accuracy of PFA. The thresholds have been determined by examining the history logs of drives that have failed in actual customer operation. When PFA detects a threshold exceeded failure, the system administrator can be notified through Netfinity Manager 5.0. The design goal of PFA is to provide a minimum of 24 hours warning before a drive experiences 'catastrophic' failure.

Second, there is the on/off type of failure. A cable breaking, a component burning out, a solder connection failing. These are all examples of unpredictable catastrophic failures. As assembly and component processes have improved, these types of defects have been reduced but not eliminated. PFA cannot always provide warning for on/off unpredictable failures.

Back to  Jump to TOP-of-PAGE

Please see the LEGAL  -  Trademark notice.
Feel free - send a Email-NOTE  for any BUG on this page found - Thank you.