SDB:Combatting IDE data corruption

Şuraya atla: kullan, ara


Version: 6.1

Symptom:

Data corruption can of course show up in a number of ways. The best way to identify the problem is of course via standard elimination and troubleshooting measures. The following are some example indications:

  • The SuSE install initially seems to work, but fails part way along. Repeat attempts fail in a different places. *
  • The install completes, but a significant number of programs behave strangely, crash with segmentation faults, etc. *
  • The computer suffers problems, crashes, and errors under heavy I/O, for example, a kernel build. *
  • Data copied from an IDE cdrom to a drive is not consistent across multiple attempts.
  • - please note these can be caused by other hardware problems as well, such as memory, processor, cache problems.

A simple, but decent test, involves copying large amounts of data from or to the IDE device, and then using diff to see if the data compares to be the same. Ex.: Copying a few hundred megabytes from a CDROM to a hard drive

cp -r /cdrom/something /tmp/something
diff -r /cdrom/something /tmp/something

If any differences are encountered, you are definitely seeing data corruption.

Cause:

A variety of problems can cause data corruption in the IDE communication stream. These include:

  • A cable of insufficient quality, or a length longer than is valid, for the IDE communication methods you are using. IDE, being an inexpensive and frequently unterminated bus is very sensitive to cabling.
  • Bugs in IDE controllers. Largely, the Linux IDE drivers work around these, but there are always more.
  • Bugs in the IDE devices. As there are more IDE devices out there than controllers (by far!), and as the time-to-market forces in PC hardware are so strong, these are very common.
  • A data or protocol has been selected in your BIOS which is not properly supported by your device/controller.

Solution:

A variety of steps are possible, and multiple may be necessary. In discussion of IDE modes, you may want to refer to the brief mode listing below, or perhaps refer to a more detailed online discussion of IDE modes, such as this one or this one.

  • Firstly, be certain you are using an appropriate cable. Ultra66 requires specific Ultra-66 cabling, which has a different pinout from traditional cables. Older IDE cables have the same pinouts, but you must be sure you are using a ribbon cable of appropriate quality. Faster data rates tend to require shorter cables and higher quality cables. Details may be added here in the future, but for now contact a reputable computer cabling vendor to acquire cables for the rates and devices you are using.

Ruling out such problems with cabling will increase reliability, and allow for faster transfers. It is worth it.

  • You may wish to check your BIOS configuration screen to see if it has facilities for selecting the IDE setup for various devices. Many modern BIOSes allow you to slect IDE transfer modes on a device by device basis. The current Linux IDE drivers will abide by the mode set by your BIOS.

Be certain that the documentation for your devices state they can support the transfer rates you have selected. If you have problems at this level, reduce the rates further until the problem is alleviated. Remember, the data errors may be happening in the controller or the cable.

  • An additionally solution which helped sometimes, is to purchase a cleaner solution. You can use it to purge the surface in order to remove possibly arrears.
  • If your BIOS lacks these features, or does not offer the selction you wish to make for your devices, it is possible to "manually" adjust it with the hdparm utility when running Linux. This can be done during a SuSE Linux install, or afterwards, although if you believe you are experiencing data loss from your CDROM controller or to your hard drive, you should probably think about reinstalling.

To reach a prompt during a Yast-1 installation in 6.3, or in SuSE installations prior to 6.3, press ALT-F2 after starting YaST. You may return to Yast with they keystroke ALT-F1. To reach a prompt during a Yast-2 installation, press CTRL-ALT-F2 after Yast-2 starts. You may return to Yast-2 with the keystroke ALT-F7. A good first step in reducing IDE problems, is to disable DMA. This can be done with the command hdparm -d0 /dev/<device>. For example, if your CDROM is set to master on the secondary controller, use the command

    hdparm -d0 /dev/hdc
If this is not sufficient, you may want to reduce the transfer rates. hdparm -XNN /dev/device will set the transfer mode. Here is what values are valid for NN:
    08 - PIO mode 0
    09 - PIO mode 1
    10 - PIO mode 2
    11 - PIO mode 3
    12 - PIO mode 4
    13 - PIO mode 5
Sometimes it is best to get a bit paranoid and set the device to the absolute basics, of course replacing /dev/hdc with your IDE device in question.
    hdparm -d0 /dev/hdc
    hdparm -X08 /dev/hdc
This of course will slow the device down somewhat, but if it stops the symptoms, you've located the problem.

Mode reference:

The following is a listing of modes from most reliable to most rapid:

PIO (programmed I/O) Mode 0 - this may not be available in all intefaces
                        DMA Mode 0
PIO Mode 1
PIO Mode 2
PIO Mode 3
                        DMA Mode 1
PIO Mode 4              DMA Mode 2
PIO Mode 5              DMA Mode 3 (also known as DMA 33, or UDMA)
                        DMA-66

All other things equal, the PIO modes tend to be better supported than the DMA modes. <keyword>dataerrors,datacorruption,idecable,PIO,cpioreadfailed,failedinstall</keyword>